Optimising LIKE expressions that start with wildcards Optimising LIKE expressions that start with wildcards sql sql

Optimising LIKE expressions that start with wildcards

Here is one (not really recommended) solution.

Create a table AddressSubstrings. This table would have multiple rows per address and the primary key of table.

When you insert an address into table, insert substrings starting from each position. So, if you want to insert 'abcd', then you would insert:

  • abcd
  • bcd
  • cd
  • d

along with the unique id of the row in Table. (This can all be done using a trigger.)

Create an index on AddressSubstrings(AddressSubstring).

Then you can phrase your query as:

SELECT *FROM Table t JOIN     AddressSubstrings ads     ON t.table_id = ads.table_idWHERE ads.AddressSubstring LIKE 'nham%';

Now there will be a matching row starting with nham. So, like should make use of an index (and a full text index also works).

If you are interesting in the right way to handle this problem, a reasonable place to start is the Postgres documentation. This uses a method similar to the above, but using n-grams. The only problem with n-grams for your particular problem is that they require re-writing the comparison as well as changing the storing.

I can't offer a complete solution to this difficult problem.

But if you're looking to create a suffix search capability, in which, for example, you'd be able to find the row containing HWilson with ilson and the row containing ABC123000654 with 654, here's a suggestion.

  WHERE REVERSE(textcolumn) LIKE REVERSE('ilson') + '%'

Of course this isn't sargable the way I wrote it here. But many modern DBMSs, including recent versions of SQL server, allow the definition, and indexing, of computed or virtual columns.

I've deployed this technique, to the delight of end users, in a health-care system with lots of record IDs like ABC123000654.

Not without a serious preparation effort, hwilson1.

At the risk of repeating the obvious - any search path optimisation - leading to the decision whether an index is used, or which type of join operator to use, etc. (independently of which DBMS we're talking about) - works on equality (equal to) or range checking (greater-than and less-than).

With leading wildcards, you're out of luck.

The workaround is a serious preparation effort, as stated up front:

It would boil down to Vertica's text search feature, where that problem is solved. See here:


For any other database platform, including MS SQL, you'll have to do that manually.

In a nutshell: It relies on a primary key or unique identifier of the table whose text search you want to optimise.

You create an auxiliary table, whose primary key is the primary key of your base table, plus a sequence number, and a VARCHAR column that will contain a series of substrings of the base table's string you initially searched using wildcards. In an over-simplified way:

If your input table (just showing the columns that matter) is this:

id    |the_search_col                           |other_col    42|The Restaurant at the End of the Universe|Arthur Dent    43|The Hitch-Hiker's Guide to the Galaxy    |Ford Prefect

Your auxiliary search table could contain:

id   |seq|search_token   42|  1|Restaurant   42|  2|End   42|  3|Universe   43|  1|Hitch-Hiker   43|  2|Guide   43|  3|Galaxy

Normally, you suppress typical "fillers" like articles and prepositions and apostrophe-s , and split into tokens separated by punctuation and white space. For your '%nham%' example, however, you'd probably need to talk to a linguist who has specialised in English morphology to find splitting token candidates .... :-]

You could start by the same technique that I use when I un-pivot a horizontal series of measures without the PIVOT clause, like here:

Pivot sql convert rows to columns

Then, use a combination of, probably nested, CHARINDEX() and SUBSTRING() using the index you get from the CROSS JOIN with a series of index integers as described in my post suggested above, and use that very index as the sequence for the auxiliary search table.

Lay an index on search_token and you'll have a very fast access path to a big table.

Not a stroll in the park, I agree, but promising ...

Happy playing -

Marco the Sane