티스토리 뷰

I am writing about performance tune of indexing rate of elasticsearch. I tried tunning of indexing rate performance for few weeks. I can`t find much more information when I tried search some information on google. That`s why I writing this article. English is not my first language and I doesn`t have confidence of my English. Then please read carefully. Any questions and opinions are welcome including grammar of English :)


1. Monitoring

We need have monitoring method before tunning of performance. I use 3 way. A. Marvel, B. ES(Elasticsearch) API, C. Linux Command. Marvel is good tool but it doesn`t return detail information of state of elasticsearch. Then I use node stat api.(http://localhost:9200/_nodes/stats) But I can`t get disk utilization state Then I use C. please refer this https://www.google.co.kr/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=linux%20disk%20utilization to get disk utilization state


2. Generating Stress

I use 2 way for generating stress. A. generator input plugin of logstash B. Jmeter. I use input plugin in of lostash at the first time but I can`t find the position where is happens the problem. Then I use bulk api directly using Jmeter. It is good choice. I can tset ES side performance using Jmeter. and then I can tuned up logstash part.


3. Tuning Point

3-1 My Environment

CPU : 12 (Hyperthread 24)
RAM : 32G
DISK : 2T * 4 (Spin Disk)
6 nodes

ES : 2.4.0 (marvel, license, head plugin)
LS : 2.4.0
Kibana : 2.4.0

Documents : nginx access log & spring application log

Goal : 20,000 documents/sec with 1 replication

3-2 Basic Configuration

I think below configuration is baic. You should configuratoin this

Disable swapping

bootstrap.memory_lock: true

File Descriptor

-Des.max-open-files 65536

3-2 Disk IO

I use Spin Disk then all bottleneck is happen in disk IO. Most of important configuration index.merge.scheduler.max_thread_count: 1 in elasticsearch.yml. The default value is Runtime.getRuntime().availableProcessors() / 2), Then that is 12 in my case. I don`t understand why default is high. even default value of indices.store.throttle.max_bytes_per_sec is 20MB.

3-3 Bulk Size

Logstash Elasticsearch Output plugin use Bulk API of ES. Default size is 150. The recommendation of size is 5M~15M & a few thousands. I test with 3500 docs per a request when I test with Jmeter. I can get 80,000 doc/sec. I can get 25,000 doc/sec when I test with logstash. I can`t set 3500 in my logstash. 2800 is maximum size in my logstash heap size. There is 2 configuration to increase bulk api size. pipeline-batch-size and flush_size.You can set --pipeline-batch-size when you run logstah comand. You can set flush_size in Elasticsearch output plugin in logstash configuratoin. If you set just one configuration. You can`t get the right number. It is good article https://discuss.elastic.co/t/how-to-tune-logstash-to-elasticsearch-shipping/51333/5 about bulk size.

3-4 Field Mapping

Default mapping type is analyzed. ES will analyze all sentence and words even if IP field. Then If you want to use resource efficiently, You should set field mapping manually. Below is my index template. You can refer this for your document.

PUT _template/logstash_template
{
  "template": "logstash-*",
  "order": 1,
  "mappings": {
    "_default_": {
      "properties": {
        "index": {
          "type": "string",
          "index": "not_analyzed"
        },
        "type": {
          "type": "string",
          "index": "not_analyzed"
        },
        "message": {
          "type": "string",
          "index": "no"
        }
      }
    },
    "access": {
      "properties": {
        "host": {
          "type": "string",
          "index": "not_analyzed"
        },
        "clientip": {
          "type": "ip",
          "index": "not_analyzed"
        },
        "cookie_daid": {
          "type": "string",
          "index": "not_analyzed"
        },
        "timestamp": {
          "type": "string",
          "index": "not_analyzed"
        },
        "verb": {
          "type": "string",
          "index": "not_analyzed"
        },
        "request": {
          "type": "string",
          "index": "analyzed"
        },
        "httpversion": {
          "type": "string",
          "index": "not_analyzed"
        },
        "response": {
          "type": "integer",
          "index": "not_analyzed"
        },
        "bytes": {
          "type": "long",
          "index": "not_analyzed"
        },
        "referrer": {
          "type": "string",
          "index": "not_analyzed"
        },
        "agent": {
          "type": "string",
          "index": "analyzed"
        },
        "x-http-forwarded": {
          "type": "string",
          "index": "analyzed"
        },
        "geoip": {
          "properties": {
            "continent_code": {
              "type": "string",
              "index": "not_analyzed"
            },
            "country_code2": {
              "type": "string",
              "index": "not_analyzed"
            },
            "country_code3": {
              "type": "string",
              "index": "not_analyzed"
            },
            "ip": {
              "type": "ip",
              "index": "not_analyzed"
            },
            "latitude": {
              "type": "float",
              "index": "not_analyzed"
            },
            "location": {
              "type": "geo_point",
              "index": "not_analyzed"
            },
            "longitude": {
              "type": "float",
              "index": "not_analyzed"
            }
          }
        }
      }
    },
    "app": {
      "properties": {
        "timestamp": {
          "type": "string",
          "index": "not_analyzed"
        },
        "loglevel": {
          "type": "string",
          "index": "not_analyzed"
        },
        "class": {
          "type": "string",
          "index": "analyzed"
        },
        "line": {
          "type": "long",
          "index": "not_analyzed"
        },
        "log": {
          "type": "string",
          "index": "analyzed"
        },
        "pid": {
          "type": "integer",
          "index": "not_analyzed"
        },
        "thread": {
          "type": "string",
          "index": "not_analyzed"
        },
        "service": {
          "type": "string",
          "index": "not_analyzed"
        },
        "host": {
          "type": "string",
          "index": "not_analyzed"
        },
        "method": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}


3-5 Others

If you search indexing performance in google, you can find some article. 
In my case that is not much helpful for increasing indexing size. But It can be different in your case.

Disable _all Field

I can`t find _all field in kibana anymore. Then I think it is default disable.

Bulk Threadpool Size


Default bulk threadpool size is # of available processors. I can`t change bulk size with below setting. I think, it is blocked in 2.4.0. It is fixed number & can`t changed with configuration.


elastisearch.yml


threadpool:

    bulk:

        size: 30


Segment Merging Parameters

There is some parameters can tuning segment merging performance. If we can reduce segment merging we can improve indexing performance theoretically but it can reduce search performance.

index.refresh_interval 

index.translog.flush_threshold_size

indices.memory.index_buffer_size


But this parameters doens`t effect indexing performance.


4. Conclustion

There is not a correct answer. But Currently My ELK performance is 25,000 docs/sec with this configuration. You can also try this configuration on your case for indexing performance.


댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함