Most efficient way to search a database with more than a billion records? Most efficient way to search a database with more than a billion records? database database

Most efficient way to search a database with more than a billion records?


If you are using FULLTEXT indexes, you're correctly writing your queries, and the speed in which the results are returned are not adequate, you are entering a territory where MySQL may simply not be sufficient for you..

You may be able to tweak settings, purchase enough RAM to make sure that your entire data-set fits 100% in memory. It's definitely true that performance gains could be huge there.

I'd definitely recommend looking into tweaks of your mysql configuration. We've had some silly settings in the past. Operating system defaults tend to really suck!

However, if you have trouble at that point, you can:

  1. Create a separate table containing each word (indexed) along with a record id that it refers to. This will allow you to search on single words.
  2. Use a different system that's optimized for solving this problem. Unless my information is now outdated, the 2 engines that are the most popular for solving this problem are:
    1. Sphinx
    2. Solr / Lucene


If your table is myISAM then you can set the Name and Description fields to FULLTEXT

CREATE TABLE articles (    id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,    Name VARCHAR(200),    Description TEXT,    FULLTEXT (Name,Description) );

Then you can use queries like:

SELECT * FROM articles    WHERE MATCH (Name,Description) AGAINST ('database');

Your can find more info at http://docs.oracle.com/cd/E17952_01/refman-5.0-en/fulltext-search.html

Before doing any of the above you might want to backup (or atleast make a copy) of your database.


You can't. The only fast search in your scenario would be on the Primary Key since that's most likely to be the index. Text search is slow as a turtle.

In all seriousness, you have a few solutions:

If you have to stick with NoSQL you'll have to redesign you scheme. It's hard to give you a good recommendation without knowing the requirements. One solution would be to index keywords in a separate table.

Another solution is to switch to a different search engine, you can find suggestions in other questions here such as: Fast SQL Server search on 40M text records