how should I think about search engine indices? how should I think about search engine indices? elasticsearch elasticsearch

how should I think about search engine indices?


In Elasticsearch, an index consists of one or more primary shards, where a shard is a Lucene instance. Each primary shard can have zero or more replicas, whose existence gives you high availability and increased search performance.

A single shard can hold a lot of data. However, with multiple shards it is easier to distribute the workload across multiple processors and multiple servers.

That said, you need a balance. The right number of shards depends on your data and context. Shards aren't free, so while it is useful to have thousands of shards if you're running a 100 node cluster, you don't want that on a single node.

In Elasticsearch, as well as having indices, you have the concept of types. Think of an index as being like a database, and a type being like a table.

Using different types has no overhead, and fits better with your example than having separate indices.

You can still search across all types (or a selected list of types) and across all indices (or a selected list) or any combination.

Each type can have its own fields (like the columns in a table) .

So in your example, I'd have one index containing 3 types, each with its own fields. Start with default number of primary shards (5) and the default number of replicas (1) and change these only when you understand your data better.

Note: don't confuse an index in Elasticsearch with an index in a database