About
Elastic Search is an open-source search engine built on top of apache lucene a full-text search-engine library. Lucene is arguably the most advanced, high-performance and fully featured search engine library in existence today—both open source and proprietary. you’ve probably been playing with Elastic search on your laptop or on a small cluster of machines laying around. But when it comes time to deploy Elastic search to production, there are a few recommendations that you should consider.
If you are an ES beginner this Article will help you to start on your first ES Cluster.
Configuration
The configuration a node can have in ES could be a big question.! How many nodes we have to choose for a cluster depends on your current data requrirement and the growing data size in future. It is obvious that the data could grow exponentially. Considering the future data requirements would be beneficial while choosing the nodes.
Note: From ‘cluster Administration’ perspective, choosing nodes for a cluster with good hardware configuration is on top of list as this would help us to achieve optimal performance from cluster.
Elastic Search recommends to start with 3 ES nodes in order to avoid split-brain problem and increase cluster stability while searching. If any of the node is down, there should be other nodes in cluster to serve ‘requests’, ‘responses’ and ‘data indexing’ at any point of time.
Node Scenarios
Nodes should be selected based on N/2+1 formula to avoid split brain & cluster without master node.
Note: Here N is the number of nodes we want to include in cluster.
Main Parameters to setup nodes and choosing good configurations for cluster are Memory, CPU, Disks, Cores, Network.
Example: Let say we have 3 nodes in cluster : now 3/2+1 = 2. Now ‘2’ denotes the minimum number of master nodes need to be there in order to form a cluster. So at any point of time there are 2 masters in cluster.
Cluster Configuration
All the node-java versions should be identical. Always use updated java version recommended by “Elastic Search” for the current version of ES installing. Same Elastic Search version and same cluster name need to be defined in Elastic Search configuration file else cluster will not be formed.
Note: Initially start with a small cluster, going further we can add as many nodes we can to existing cluster to balance the pressure across the cluster.
Hardware Spec’s
Choosing RAM is a crucial factor as it depends on your aggregations and faceting on Elastic Search. If the ‘number of searches made’ are more, its an advantage to add a huge RAM up to 32 GB. Also make sure not to opt for RAM capacity greater than the RAM allocated to one ES node.
Note: Allocating half of the space of System RAM to Elastic Search heap space benefits the remaining memory to be useful for OS.
Hard Disk
It is recommended to use SSD disks instead of HDD as Elastic Search will store the data in file system only. If we use SSD we can get optimal performance on read/writes.
Memory
How much memory and hard disk we can startup with initially really matters the most .If there is one resource that you will run out of first, it will likely be memory. Sorting and aggregations can both be memory hungry, so enough heap space to accommodate these is important. Even when the heap is comparatively small, extra memory can be given to the OS file system cache.
Cores per Processor
Most Elastic Search deployments tend to be light on CPU requirements. As such, the exact processor setup matters less than the other resources. You should choose a modern processor with multiple cores. Common clusters utilize two to eight core machines.If you need to choose between faster CPU’s or more cores, choose more cores. The extra concurrency that multiple cores offers will far outweigh a slightly faster clock speed.
Network
A fast and reliable network is obviously important for a great performance in a distributed system. Low latency helps ensure that nodes can communicate easily, while high bandwidth helps shard movement and recovery.