I am writing about performance tune of indexing rate of elasticsearch. I tried tunning of indexing rate performance for few weeks. I can`t find much more information when I tried search some information on google. That`s why I writing this article. English is not my first language and I doesn`t have confidence of my English. Then please read carefully. Any questions and opinions are welcome including grammar of English :)
1. Monitoring
We need have monitoring method before tunning of performance. I use 3 way. A. Marvel, B. ES(Elasticsearch) API, C. Linux Command. Marvel is good tool but it doesn`t return detail information of state of elasticsearch. Then I use node stat api.(http://localhost:9200/_nodes/stats) But I can`t get disk utilization state Then I use C. please refer this https://www.google.co.kr/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linux%20disk%20utilization to get disk utilization state
2. Generating Stress
I use 2 way for generating stress. A. generator input plugin of logstash B. Jmeter. I use input plugin in of lostash at the first time but I can`t find the position where is happens the problem. Then I use bulk api directly using Jmeter. It is good choice. I can tset ES side performance using Jmeter. and then I can tuned up logstash part.
3. Tuning Point
3-1 My Environment
CPU : 12 (Hyperthread 24)
RAM : 32G
DISK : 2T * 4 (Spin Disk)
6 nodes
ES : 2.4.0 (marvel, license, head plugin)
LS : 2.4.0
Kibana : 2.4.0
Documents : nginx access log & spring application log
Goal : 20,000 documents/sec with 1 replication
3-2 Basic Configuration
I think below configuration is baic. You should configuratoin this
Disable swapping
bootstrap.memory_lock: true
File Descriptor
-Des.max-open-files 65536
3-2 Disk IO
I use Spin Disk then all bottleneck is happen in disk IO. Most of important configuration index.merge.scheduler.max_thread_count: 1 in elasticsearch.yml. The default value is Runtime.getRuntime().availableProcessors() / 2), Then that is 12 in my case. I don`t understand why default is high. even default value of indices.store.throttle.max_bytes_per_sec is 20MB.
3-3 Bulk Size
Logstash Elasticsearch Output plugin use Bulk API of ES. Default size is 150. The recommendation of size is 5M~15M & a few thousands. I test with 3500 docs per a request when I test with Jmeter. I can get 80,000 doc/sec. I can get 25,000 doc/sec when I test with logstash. I can`t set 3500 in my logstash. 2800 is maximum size in my logstash heap size. There is 2 configuration to increase bulk api size.
pipeline-batch-size and
flush_size.You can set --pipeline-batch-size when you run logstah comand. You can set flush_size in Elasticsearch output plugin in logstash configuratoin. If you set just one configuration. You can`t get the right number. It is good article
https://discuss.elastic.co/t/how-to-tune-logstash-to-elasticsearch-shipping/51333/5 about bulk size.
3-4 Field Mapping
Default mapping type is analyzed. ES will analyze all sentence and words even if IP field. Then If you want to use resource efficiently, You should set field mapping manually. Below is my index template. You can refer this for your document.
PUT _template/logstash_template
{
"template": "logstash-*",
"order": 1,
"mappings": {
"_default_": {
"properties": {
"index": {
"type": "string",
"index": "not_analyzed"
},
"type": {
"type": "string",
"index": "not_analyzed"
},
"message": {
"type": "string",
"index": "no"
}
}
},
"access": {
"properties": {
"host": {
"type": "string",
"index": "not_analyzed"
},
"clientip": {
"type": "ip",
"index": "not_analyzed"
},
"cookie_daid": {
"type": "string",
"index": "not_analyzed"
},
"timestamp": {
"type": "string",
"index": "not_analyzed"
},
"verb": {
"type": "string",
"index": "not_analyzed"
},
"request": {
"type": "string",
"index": "analyzed"
},
"httpversion": {
"type": "string",
"index": "not_analyzed"
},
"response": {
"type": "integer",
"index": "not_analyzed"
},
"bytes": {
"type": "long",
"index": "not_analyzed"
},
"referrer": {
"type": "string",
"index": "not_analyzed"
},
"agent": {
"type": "string",
"index": "analyzed"
},
"x-http-forwarded": {
"type": "string",
"index": "analyzed"
},
"geoip": {
"properties": {
"continent_code": {
"type": "string",
"index": "not_analyzed"
},
"country_code2": {
"type": "string",
"index": "not_analyzed"
},
"country_code3": {
"type": "string",
"index": "not_analyzed"
},
"ip": {
"type": "ip",
"index": "not_analyzed"
},
"latitude": {
"type": "float",
"index": "not_analyzed"
},
"location": {
"type": "geo_point",
"index": "not_analyzed"
},
"longitude": {
"type": "float",
"index": "not_analyzed"
}
}
}
}
},
"app": {
"properties": {
"timestamp": {
"type": "string",
"index": "not_analyzed"
},
"loglevel": {
"type": "string",
"index": "not_analyzed"
},
"class": {
"type": "string",
"index": "analyzed"
},
"line": {
"type": "long",
"index": "not_analyzed"
},
"log": {
"type": "string",
"index": "analyzed"
},
"pid": {
"type": "integer",
"index": "not_analyzed"
},
"thread": {
"type": "string",
"index": "not_analyzed"
},
"service": {
"type": "string",
"index": "not_analyzed"
},
"host": {
"type": "string",
"index": "not_analyzed"
},
"method": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
3-5 Others
If you search indexing performance in google, you can find some article.
In my case that is not much helpful for increasing indexing size. But It can be different in your case.
Disable _all Field
I can`t find _all field in kibana anymore. Then I think it is default disable.
Bulk Threadpool Size
Default bulk threadpool size is # of available processors. I can`t change bulk size with below setting. I think, it is blocked in 2.4.0. It is fixed number & can`t changed with configuration.
elastisearch.yml
threadpool:
bulk:
size: 30
Segment Merging Parameters
There is some parameters can tuning segment merging performance. If we can reduce segment merging we can improve indexing performance theoretically but it can reduce search performance.
index.refresh_interval
index.translog.flush_threshold_size
indices.memory.index_buffer_size
But this parameters doens`t effect indexing performance.
4. Conclustion
There is not a correct answer. But Currently My ELK performance is 25,000 docs/sec with this configuration. You can also try this configuration on your case for indexing performance.