Javascript fuzzy search that makes sense

javascript regex pattern-matching string-matching fuzzy-search

I tried using existing fuzzy libraries like fuse.js and also found them to be terrible, so I wrote one which behaves basically like sublime's search. https://github.com/farzher/fuzzysort

The only typo it allows is a transpose. It's pretty solid (1k stars, 0 issues), very fast, and handles your case easily:

fuzzysort.go('int', ['international', 'splint', 'tinder'])// [{highlighted: '*int*ernational', score: 10}, {highlighted: 'spl*int*', socre: 3003}]

javascript regex pattern-matching string-matching fuzzy-search

Good question! But my thought is that, rather than trying to modify Levenshtein-Demerau, you might be better to try a different algorithm or combine/ weight the results from two algorithms.

It strikes me that exact or close matches to the "starting prefix" are something Levenshtein-Demerau gives no particular weight to -- but your apparent user expectations would.

I searched for "better than Levenshtein" and, among other things, found this:

http://www.joyofdata.de/blog/comparison-of-string-distance-algorithms/

This mentions a number of "string distance" measures. Three which looked particularly relevant to your requirement, would be:

Longest Common Substring distance: Minimum number of symbols that have to be removed in both strings until resulting substrings are identical.
q-gram distance: Sum of absolute differences between N-gram vectors of both strings.
Jaccard distance: 1 minues the quotient of shared N-grams and all observed N-grams.

Maybe you could use a weighted combination (or minimum) of these metrics, with Levenshtein -- common substring, common N-gram or Jaccard will all strongly prefer similar strings -- or perhaps try just using Jaccard?

Depending on the size of your list/ database, these algorithms can be moderately expensive. For a fuzzy search I implemented, I used a configurable number of N-grams as "retrieval keys" from the DB then ran the expensive string-distance measure to sort them in preference order.

I wrote some notes on Fuzzy String Search in SQL. See:

http://literatejava.com/sql/fuzzy-string-search-sql/

javascript regex pattern-matching string-matching fuzzy-search

Here is a technique I have used a few times...It gives pretty good results. Does not do everything you asked for though. Also, this can be expensive if the list is massive.

get_bigrams = (string) ->    s = string.toLowerCase()    v = new Array(s.length - 1)    for i in [0..v.length] by 1        v[i] = s.slice(i, i + 2)    return vstring_similarity = (str1, str2) ->    if str1.length > 0 and str2.length > 0        pairs1 = get_bigrams(str1)        pairs2 = get_bigrams(str2)        union = pairs1.length + pairs2.length        hit_count = 0        for x in pairs1            for y in pairs2                if x is y                    hit_count++        if hit_count > 0            return ((2.0 * hit_count) / union)    return 0.0

Pass two strings to string_similarity which will return a number between 0 and 1.0 depending on how similar they are. This example uses Lo-Dash

Usage Example....

query = 'jenny Jackson'names = ['John Jackson', 'Jack Johnson', 'Jerry Smith', 'Jenny Smith']results = []for name in names    relevance = string_similarity(query, name)    obj = {name: name, relevance: relevance}    results.push(obj)results = _.first(_.sortBy(results, 'relevance').reverse(), 10)console.log results

Also....have a fiddle

Make sure your console is open or you wont see anything :)

CodeHunter

Javascript fuzzy search that makes sense

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last