What is the difference between Lucene and Elasticsearch What is the difference between Lucene and Elasticsearch elasticsearch elasticsearch

What is the difference between Lucene and Elasticsearch


Lucene is a Java library. You can include it in your project and refer to its functions using function calls.

Elasticsearch is a JSON Based, Distributed, web server built over Lucene.Though it's Lucene who is doing the actual work beneath, Elasticsearch provides us a convenient layer over Lucene. Each shard that gets created in Elasticsearch is a separate Lucene instance.So to summarize

  1. Elasticsearch is built over Lucene and provides a JSON based REST API to refer to Lucene features.
  2. Elasticsearch provides a distributed system on top of Lucene. A distributed system is not something Lucene is aware of or built for. Elasticsearch provides this abstraction of distributed structure.
  3. Elasticsearch provides other supporting features like thread-pool, queues, node/cluster monitoring API, data monitoring API, Cluster management, etc.


In addition to @Vineeth Mohan words:

High Availability: Elasticsearch is distributed, so that it can manage data replication, which means having multiple copies of data in your cluster. This enables high availability.

Powerful Query DSL:Elasticsearch offers us, JSON interface for reading and writing queries on top of Lucene. Thanks to Elasticsearch, you can write complex queries without knowing Lucene syntax.

Schemaless (Schema-Free): Fields(name,value pairs) for schema do not have to be defined before. When you index data, elasticsearch can create schema automatically at runtime, like magic.


I'll answer from a usage perspective.

Lucene is a search engine library. You'd want to use it to build your own search engine: either a new Elasticsearch or Solr competitor or something narrow for your use-case (e.g. text analysis).

Elasticsearch is a search engine. Most people use it for log aggregation, product search, or a variant of these two (e.g. social media analysis or finding relevant people for some search criteria). It's built on top of Lucene, so it exposes most (though not all) of its features. It also adds a lot on top, most significantly:

  • REST API
  • query DSL
  • distributed system (sharding, replication, cluster management)
  • facets/aggregations
  • additional features for common usage (e.g. ingest processing) and management (APIs for monitoring its relevant metrics, backup and restore, etc)