How to really reindex data in elasticsearch How to really reindex data in elasticsearch elasticsearch elasticsearch

How to really reindex data in elasticsearch


Re-indexing means to read the data, delete the data in elasticsearch and ingest the data again. There is no such thing like "change the mapping of existing data in place." All the re-indexing tools you mentioned are just wrappers around read->delete->ingest.
You can always adjust the mapping for new indices and add fields later. All the new fields will be indexed with respect to this mapping. Or use dynamic mapping if you are not in control of the new fields.
Have a look at Change default mapping of string to "not analyzed" in Elasticsearch to see how to use dynamic mapping to get not_analyzed fields of strings.

Re-indexing is very expensive. Better way is to create a new index and drop the old one. To achieve this with zero downtime, use index alias for all your customers. Think of an index called "data-version1". In steps:

  • create your index "data-version1" and give it an alias named "data"
  • only use the alias "data" in all your client applications
  • to update your mapping: create a new index (with the new mapping) called "data-version2" and put all your data in
  • to switch from version1 to version2: drop the alias "data" on version1 and create an alias "data" on version2 (or first create, then drop). the time in between those two steps your clients will have no (or double) data. but the time between dropping and creating an alias should be so short your clients shouldn't recognize it.

It's good practice to always use aliases.


With version 2.3.4 a new api _reindex is available which will do exactly what it says. Basic usage is

{    "source": {        "index": "currentIndex"    },    "dest": {        "index": "newIndex"    }}


Elasticsearch Reindex from Remote host to Local Host example (Jan 2020 Update)

# show indices on this hostcurl 'localhost:9200/_cat/indices?v'# edit elasticsearch configuration file to allow remote indexingsudo vi /etc/elasticsearch/elasticsearch.yml## copy the line below somewhere in the file>>># --- whitelist for remote indexing ---reindex.remote.whitelist: my-remote-machine.my-domain.com:9200<<<# restart elaticsearch servicesudo systemctl restart elasticsearch# run reindex from remote machine to copy the index named filebeat-2016.12.01curl -H 'Content-Type: application/json' -X POST 127.0.0.1:9200/_reindex?pretty -d'{  "source": {    "remote": {      "host": "http://my-remote-machine.my-domain.com:9200"    },    "index": "filebeat-2016.12.01"  },  "dest": {    "index": "filebeat-2016.12.01"  }}'# verify index has been copiedcurl 'localhost:9200/_cat/indices?v'