How Do I Apply TF-IDF When I Only Have a Subset of the Total Documents?

database elasticsearch search tf-idf

As you have noticed Elasticsearch is not built to run in memory constrained environments. If you want to use Elasticsearch, but can't set up a dedicated machine, you might consider using a hosted search solution (e.g. AWS Elasticsearch, Elastic Cloud, Algolia, etc.). Those solutions still cost though!

There are two great alternatives that require a bit more work (but not as much as writing your own search solution). Lucene is the actual Search Engine that Elasticsearch is written on top of. It does still load quite a bit of the underlying data structures into memory, so, depending on the size of the underlying data you want to index, it could still run out of memory. But, you should be able to fit quite a bit more data in a single Lucene index than in an entire Elasticsearch instance.

The other alternative, which I know slightly less about, is Sphinx. It is also a Search Engine. And it also allows you to specify how much memory to allocate for it to use. It stores the rest of the data on disk.

CodeHunter

How Do I Apply TF-IDF When I Only Have a Subset of the Total Documents?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last