elasticsearch python bulk api (elasticsearch-py) elasticsearch python bulk api (elasticsearch-py) elasticsearch elasticsearch

elasticsearch python bulk api (elasticsearch-py)


In case someone is currently trying to use the bulk api and wondering what the format should be, here's what worked for me:

doc = [    {        'index':{            '_index': index_name,            '_id' : <some_id>,            '_type':<doc_type>        }    },    {        'field_1': <value>,        'field_2': <value>    }]docs_as_string = json.dumps(doc[0]) + '\n' + json.dumps(doc[1]) + '\n'client.bulk(body=docs_as_string)


From @HonzaKral on github

https://github.com/elasticsearch/elasticsearch-py/issues/135

Hi sirkubax,

the bulk api (as do all the others) follows very closely the bulk api format for elasticsearch itself, so the body would have to be:

doc = '''{"index": {}}\n{"host":"logsqa","path":"/logs","message":"test test","@timestamp":"2014-10-02T10:11:25.980256","tags":["multiline","mydate_0.005"]}\n'''for it to work. Alternatively it could be a list of those two dicts.

This is a complicated and clumsy format to work with from python, that's why I tried to create a more convenient way to work with bulk in elasticsearch.helpers.bulk (0). It simply accepts an iterator of documents, will extract any optional metadata from it (like _id, _type etc) and construct (and execute) the bulk request for you. For more info on the accepted formats see the docs for streaming_bulk above which is a helper to process the stream in iterative manner (one at a time from the point of the user, batched in chunks in the background).

Hope this helps.

0 - http://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.bulk