What is an index in Elasticsearch What is an index in Elasticsearch elasticsearch elasticsearch

What is an index in Elasticsearch


Good question, and the answer is a lot more nuanced than one might expect. You can use indices for several different purposes.

Indices for Relations

The easiest and most familiar layout clones what you would expect from a relational database. You can (very roughly) think of an index like a database.

  • MySQL => Databases => Tables => Rows/Columns
  • ElasticSearch => Indices => Types => Documents with Properties

An ElasticSearch cluster can contain multiple Indices (databases), which in turn contain multiple Types (tables). These types hold multiple Documents (rows), and each document has Properties (columns).

So in your car manufacturing scenario, you may have a SubaruFactory index. Within this index, you have three different types:

  • People
  • Cars
  • Spare_Parts

Each type then contains documents that correspond to that type (e.g. a Subaru Imprezza doc lives inside of the Cars type. This doc contains all the details about that particular car).

Searching and querying takes the format of: http://localhost:9200/[index]/[type]/[operation]

So to retrieve the Subaru document, I may do this:

  $ curl -XGET localhost:9200/SubaruFactory/Cars/SubaruImprezza

.

Indices for Logging

Now, the reality is that Indices/Types are much more flexible than the Database/Table abstractions we are used to in RDBMs. They can be considered convenient data organization mechanisms, with added performance benefits depending on how you set up your data.

To demonstrate a radically different approach, a lot of people use ElasticSearch for logging. A standard format is to assign a new index for each day. Your list of indices may look like this:

  • logs-2013-02-22
  • logs-2013-02-21
  • logs-2013-02-20

ElasticSearch allows you to query multiple indices at the same time, so it isn't a problem to do:

  $ curl -XGET localhost:9200/logs-2013-02-22,logs-2013-02-21/Errors/_search=q:"Error Message"

Which searches the logs from the last two days at the same time. This format has advantages due to the nature of logs - most logs are never looked at and they are organized in a linear flow of time. Making an index per log is more logical and offers better performance for searching.

.

Indices for Users

Another radically different approach is to create an index per user. Imagine you have some social networking site, and each users has a large amount of random data. You can create a single index for each user. Your structure may look like:

  • Zach's Index
    • Hobbies Type
    • Friends Type
    • Pictures Type
  • Fred's Index
    • Hobbies Type
    • Friends Type
    • Pictures Type

Notice how this setup could easily be done in a traditional RDBM fashion (e.g. "Users" Index, with hobbies/friends/pictures as types). All users would then be thrown into a single, giant index.

Instead, it sometimes makes sense to split data apart for data organization and performance reasons. In this scenario, we are assuming each user has a lot of data, and we want them separate. ElasticSearch has no problem letting us create an index per user.


@Zach's answer is valid for elasticsearch 5.X and below. Since elasticsearch 6.X Type has been deprecated and will be completely removed in 7.X. Quoting the elasticsearch docs:

Initially, we spoke about an “index” being similar to a “database” in an SQL database, and a “type” being equivalent to a “table”. This was a bad analogy that led to incorrect assumptions.

Further to explain, two columns with the same name in SQL from two different tables can be independent of each other. But in an elasticsearch index that is not possible since they are backed by the same Lucene field. Thus, "index" in elasticsearch is not quite same as a "database" in SQL. If there are any same fields in an index they will end up having conflicts of field types. To avoid this the elasticsearch documentation recommends storing index per document type.

Refer: Removal of mapping types


An index is a data structure for storing the mapping of fields to the corresponding documents. The objective is to allow faster searches, often at the expense of increased memory usage and preprocessing time.

The number of indexes you create is a design decision that you should take according to your application requirements. You can have an index for each business concept... You can an index for each month of the year...

You should invest some time getting acquainted with lucene and elasticsearch concepts.

Take a look at the introductory video and to this one with some data design patterns