Improving filepath search in mysql Improving filepath search in mysql unix unix

Improving filepath search in mysql


Your search strategy is, as you noticed, slow. It's slow because

 LIKE '%something%'

has to scan the table to find matches. Leading % signs on LIKE searches are an excellent way to wreck performance.

I don't know how many columns are in your path table. If there are a lot of columns you could do two quick things to improve performance:

  1. get rid of SELECT * and list the names of the columns you want in your resultset.
  2. create a compound index consisting of your filename column followed by the other columns you need to retrieve.

(This won't help if you only have a few columns in your table.)

You can't use straight-out-of-the-software-package FULLTEXT searching for this stuff, because that's designed for language text.

If I had to make this work fast for production, I would do this:

First, create a new table called "searchterm" containing

 filename_id INT   the id number of a row in your path table searchterm  VARCHAR(20)  a fragment of a filename.

Second, write a program that reads the filename_id and filename values, and inserts a bunch of different rows for each one into searchterm. For the item you've shown the values should be:

LG_MARGINCALL_HD2CH_127879834_EN.mov   (original)LG  MARGINCALL  HD2CH  127879834  EN  mov   (split on punctuation) HD 2 CH                                    (split on embedded numerics) MARGIN CALL                                (split on an app-specific list of words)

So, you'd have a bunch of entries in your searchterm table, all with the same filename_id value and lots of different little chunks of text.

Finally, when searching you could do this.

 SELECT path.id, path.filename, path.whatever,        COUNT(DISTINCT searchterms.term) AS termcount   FROM path   JOIN searchterm ON path.filenanme_id = search.filename_id  WHERE searchterm.term  IN ('margin','call','hd','en', 'mov')  GROUP BY path.id, path.filename, path.whatever  ORDER BY path.filename, COUNT(DISTINCT searchterms.term) DESC

This little query finds all the matching fragments to what you're search for. It returns multiple file names, and it presents them in order of what matches the most terms.

What I'm suggesting is that you create your own application-specific kinda- sorta- full text search system. If your really have several million multimedia files, this is surely worth your effort.


It seems clear that you need full-text search functionality.

There are multiple solutions out there that can respond to this, one of the best at the moment being Elastic Search.

It has all the capabilities to handle real time full-text search.And it goes largely beyond just this by providing auto-suggestions, autocomplete, etc.

And it's open source.


Stop using like statement instead use match() and use a full text index for your searching column and your table must be a MYISAM one(I don't know whether it is or not)