Terminologies of distributed system: node,shard, cluster Terminologies of distributed system: node,shard, cluster elasticsearch elasticsearch

Terminologies of distributed system: node,shard, cluster


to 1)

a node refers one machine of a cluster. a socket refers one processor of a machine. a core refers one processing unit of a socket. a cpu is typically same as core.

For example, Tianhe-2 - as one cluster - has 130,000 nodes, 260,000 sockets, and 3,120,000 cores. https://www.top500.org/system/177999


Considering the elasticsearch tag in your question, Here is the elasticsearch nomemclature:

According to https://www.elastic.co/guide/en/elasticsearch/guide/current/_an_empty_cluster.html

Elasticsearch Node:

A node is a running instance of Elasticsearch

Elasticsearch Cluster

A cluster consists of one or more nodes with the same cluster.name that are working together to share their data and workload.

According to https://www.elastic.co/guide/en/elasticsearch/guide/current/_add_an_index.html

Elasticsearch Shard

A shard is a low-level worker unit that holds just a slice of all the data in the index.

A shard is a single instance of Lucene, and is a complete search engine in its own right

Okay, now we have seen the concept of Cluster, Node and Shard in Elasticsearch. We can see that those definitions are pretty different (because specific to ES) to the one given by xosp7tom.

One piece of advice would be to read the elasticsearch chapter: https://www.elastic.co/guide/en/elasticsearch/guide/current/distributed-cluster.html if you want to have more information on how Elasticsearch team built their distributed search engine. It is pretty interesting and a good introduction to distributed system!


I found all my answers and cleared confusions from here: Elastic Search 5.x: Basic Concepts

Note: this reference guide is for 5.x version. I was looking at the 2.x version before which doesn't not have a clear explanation on these issues.The links provided by @Artholl in his answer also belongs to 2.x