Caching vs Indexing

caching indexing elasticsearch

The whole purpose of a cache is to return already requested data as fast as possible. One constraint of caches is that they cannot be too big either as the lookup time would increase and thus defeat the purpose of having a cache in the first place. That being said, it comes as no surprise that if you plan to have a few million/billion records in your DB, it won't be difficult to index them all but it will be difficult to cache them all, though since RAM is getting cheaper and cheaper, you might be able to store all you need in memory. You also need to ask yourself whether your cache needs to be distributed across several hosts or not (whether now or in the future).

Considering that lookups and queries in ES are extremely fast (+ ES brings you many more benefits in addition to that, such as scoring), i.e. usually faster than retrieving the same data from your DB, it would make sense to use ES as a cache. One issue I see is a common one, i.e. as soon as you start duplicating data (DB -> ES), you need to ensure that both stores don't get out of synch.

Now, if in addition you throw a cache into that mix, it's a third data store to maintain and to ensure is consistent with the main data store. If you know your data is pretty stable, i.e. written and then not updated frequently, then that might be ok, but you need to keep this very concern in mind all the time when designing your data access strategy.

As @paweloque said, in the end it all depends on your exact use case(s). Every problem is different and I can attest that after a few dozen projects around ES over the past five years or so, I've never seen two projects configured the same way. A cache might make sense for some specific cases, but not at all for others.

You need to think hard how and where you need to store your data, who is requesting them (and at what rate), who is creating/updating them (and at what rate), but in the end, the best practice is to keep your stack as lean as possible with only as few components as needed, each one being a potential bottleneck that you have to understand, integrate, maintain, tune and monitor.

Finally, I'd add one more thing: adding a cache or an index should be considered a performance optimization of your software stack. As you probably know the common saying "Premature optimization is root of all evil", you should first go with your database only, measure the performance, load test it, then witness that it might not support the load. Then only, you can decide to throw a cache at it and/or an index depending on the needs. Again, load test, measure, then decide. If you only have ten users making a few requests per day, having only a DB might be perfectly fine. You have to understand when and why you need to add another layer on your Tower of Babel, but most importantly you need to add one layer at a time and see how that layer improves/degrades the stability of the stack.

Last but not least, you can find some online articles from people having used ES as caches (mainly key-value stores, and object caches).

caching indexing elasticsearch

Your question:

Q. What's the real difference between a caching solution and an indexing solution?

A. The simple difference is that cache is used to store frequently used data to serve the same requests faster. In essence your cache is faster than your main store but is lower in size, therefore, data it can store (considering the common that it would be more expensive)

Indexing is made on all of the data to make it searchable faster. A simple Hashtable/HashMap have hash's as indexes and in an Array the 0s and 1s are the indexes.

You can index some columns to search them faster. But cache is the place you would want to have your data to fetch them faster. Normally Cache is the RAM and database is from HardDisk

Cache is also usually a key value store so if you know the key then fetch it from the cache, no need to run a query.In NHibernate and EntityFrameworks, Query caches are plugged in with queries as keys and all the data is cached. Now your queries will be fetched from the cache instead of running it through the database.

caching indexing elasticsearch

Interesting question! Well you could in deed use elasticsearch to implement a cache. It provides some functions with witch you can expire documents, but I'm not sure whether they are well suited to expire the cache. The problem is that elasticsearch is not built to be a caching solution. It's sweet spot is indexing and finding documents.

Indexing is the task of building an index, like it is done for the books: You read the entire text and write down on which page the words were found. This allows us later to find the positions of the words in the text very fast.

Elasticsearch provides a toolbox that will allow you to define how to index and process the text, i.e. apply stemming. Then in the next step, it will provide you different types of queries to find your documents.

You could, however, write documents into elasticsearch and use the id of the document to read it. Like that you could use elasticsearch as a store which might be used as a cache.

CodeHunter

Caching vs Indexing

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last