Our Work

Indexing Data to Elastic Search Part-3

Posted 1 year, 7 months and 5 days ago

Part 3:

Tested Environment:

  • OS: Linux Redhat
  • Feeder Version: 1.5.2
  • MySQL: 5.6

Feeder:

The another way of indexing data to elasticsearch is using Feeder which is introduced by elasticsearch from version 1.5.x.

Advantages of Feeder:

  • No need to install on every node in a cluster
  • A standalone plug-in which can be outside of Elasticsearch
  • A shell script is required to configure feeder.
  • We can take data from multiple tables and indexing to elasticsearch
  • No need to close feeder once we indexed data to elasticsearch

The following is the way we have to implement feeder.

Implementing Feeder:

Reference Link for installation or downloading plug-in: https://github.com/jprante/elasticsearch-jdbc

*We can keep the plug-in on any node in your cluster. But to start Feeder we have to set following property before running the script in “/etc/profile file (environment variable)”

export JDBC_IMPORTER_HOME=/home/users/ktree/java_setup_files/jdbc1.5.2/elasticsearch-jdbc-1.5.2.0/bin
This will guide where plug-in is...
* script is as same as river only. folder view of the feeder.
river script for pulling table data and index to elastic search(.sh script)
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
bin=${DIR}/../bin
lib=${DIR}/../lib
echo '
{
"type" : "jdbc",
"jdbc" : {
"url": "jdbc:mysql://10.0.1.41:3306/test",
"user": "test",
"password" : "secret",
"sql" : "SELECT * FROM `test_DB`",
"treat_binary_as_string" : true,
"elasticsearch" : {
"cluster" : "elasticsearch",
"host" : "10.0.0.41",
"port" : 9300
},
"max_bulk_actions" : 160,
"max_concurrent_bulk_requests" : 5,
"index" : "test",
"type":"test1",
"timezone" : "America/Los Angeles"
}
}
' | java \
-cp "${lib}/*" \
-Dlog4j.configurationFile=${bin}/log4j2.xml \
org.xbib.tools.Runner \
org.xbib.tools.JDBCImporter

save it with ".sh" and execute it.

It will index table data to elasticsearch.

Since Elastic search is nosql there is no need to create index before populating data .the schema will be create automatically by detecting fields. This is not best practice because default data types of elastic search assigned to fields may be occupy more space than required.

So it is best practice to define index structure properly before indexing data in to it

Related Posts