elasticsearch python bulk api (elasticsearch-py)
In case someone is currently trying to use the bulk api and wondering what the format should be, here's what worked for me:
doc = [ { 'index':{ '_index': index_name, '_id' : <some_id>, '_type':<doc_type> } }, { 'field_1': <value>, 'field_2': <value> }]docs_as_string = json.dumps(doc[0]) + '\n' + json.dumps(doc[1]) + '\n'client.bulk(body=docs_as_string)
From @HonzaKral on github
https://github.com/elasticsearch/elasticsearch-py/issues/135
Hi sirkubax,
the bulk api (as do all the others) follows very closely the bulk api format for elasticsearch itself, so the body would have to be:
doc = '''{"index": {}}\n{"host":"logsqa","path":"/logs","message":"test test","@timestamp":"2014-10-02T10:11:25.980256","tags":["multiline","mydate_0.005"]}\n'''for it to work. Alternatively it could be a list of those two dicts.
This is a complicated and clumsy format to work with from python, that's why I tried to create a more convenient way to work with bulk in elasticsearch.helpers.bulk (0). It simply accepts an iterator of documents, will extract any optional metadata from it (like _id, _type etc) and construct (and execute) the bulk request for you. For more info on the accepted formats see the docs for streaming_bulk above which is a helper to process the stream in iterative manner (one at a time from the point of the user, batched in chunks in the background).
Hope this helps.
0 - http://elasticsearch-py.readthedocs.org/en/master/helpers.html#elasticsearch.helpers.bulk