Django max similarity (TrigramSimilarity) from ManyToManyField Django max similarity (TrigramSimilarity) from ManyToManyField postgresql postgresql

Django max similarity (TrigramSimilarity) from ManyToManyField


You cannot break up the tags__name (at least I don't know a way).
From your examples, I can assume 2 possible solutions (1st solution is not strictly using Django):


  1. Not everything needs to pass strictly through Django
    We have Python powers, so let's use them:

    Let us compose the query first:

    from difflib import SequenceMatcherfrom django.db.models import Qdef create_query(fulltext):    illustration_names = Illustration.objects.values_list('name', flat=True)    tag_names = Tag.objects.values_list('name', flat=True)    query = []    for name in illustration_names:        score = SequenceMatcher(None, name, fulltext).ratio()        if score == 1:            # Perfect Match for name            return [Q(name=name)]         if score >= THRESHOLD:            query.append(Q(name=name))    for name in tag_names:        score = SequenceMatcher(None, name, fulltext).ratio()        if score == 1:            # Perfect Match for name            return [Q(tags__name=name)]         if score >= THRESHOLD:            query.append(Q(tags__name=name))    return query

    Then to create your queryset:

    from functools import reduce # Needed only in python 3from operator import or_queryset = Illustration.objects.filter(reduce(or_, create_query(fulltext)))

    Decode the above:

    We are checking every Illustration and Tag name against our fulltext and we are composing a query with every name that it's similarity passes the THRESHOLD.

    • SequenceMatcher method compares sequences and returns a ratio 0 < ratio < 1 where 0 indicates No-Match and 1 indicates Perfect-Match. Check this answer for another usage example: Find the similarity percent between two strings (Note: There are other strings comparing modules as well, find one that suits you)
    • Q() Django objects, allow the creation of complex queries (more on the linked docs).
    • With the operator and reduce we transform a list of Q() objects to an OR separated query argument:
      Q(name=name_1) | Q(name=name_2) | ... | Q(tag_name=tag_name_1) | ...

    Note:You need to define an acceptable THRESHOLD.
    As you can imagine this will be a bit slow but it is to be expected when you need to do a "fuzzy" search.


  1. (The Django Way:)
    Use a query with a high similarity threshold and order the queryset by this similarity rate:

    queryset.annotate(    similarity=Greatest(        TrigramSimilarity('name', fulltext),         TrigramSimilarity('tags__name', fulltext)    )).filter(similarity__gte=threshold).order_by('-similarity')

    Decode the above:

    • Greatest() accepts an aggregation (not to be confused with the Django method aggregate) of expressions or of model fields and returns the max item.
    • TrigramSimilarity(word, search) returns a rate between 0 and 1. The closer the rate is to 1, the more similar the word is to search.
    • .filter(similarity__gte=threshold), will filter similarities lower than the threshold.
    • 0 < threshold < 1. You can set the threshold to 0.6 which is pretty high (consider that the default is 0.3). You can play around with that to tune your performance.
    • Finally, order the queryset by the similarity rate in a descending order.


I solved it using only TrigramSimilarity, Max and Greatest.

I populated some data as in your question:

from illustrations.models import Illustration, TagTag.objects.bulk_create([Tag(name=t) for t in ['Animal', 'Brown', 'Animals']])Illustration.objects.bulk_create([Illustration(name=t) for t in ['Dog', 'Cat']])dog=Illustration.objects.get(name='Dog')cat=Illustration.objects.get(name='Cat')animal=Tag.objects.get(name='Animal')brown=Tag.objects.get(name='Brown')animals=Tag.objects.get(name='Animals')dog.tags.add(animal, brown)cat.tags.add(animals)

I imported all necessary functions and initialized fulltext:

from illustrations.models import Illustrationfrom django.contrib.postgres.search import TrigramSimilarityfrom django.db.models.functions import Greatestfrom django.db.models import Maxfulltext = 'Animal'

Then I executed the query:

Illustration.objects.annotate(    max_similarity=Greatest(        Max(TrigramSimilarity('tags__name', fulltext)),        TrigramSimilarity('name', fulltext)    )).values('name', 'max_similarity')

With this results:

<QuerySet [{'name': 'Dog', 'max_similarity': 1.0}, {'name': 'Cat', 'max_similarity': 0.666667}]>

This is the SQL query exceuted from PostgreSQL:

SELECT "illustrations_illustration"."name", GREATEST(MAX(SIMILARITY("illustrations_tag"."name", 'Animal')), SIMILARITY("illustrations_illustration"."name", 'Animal')) AS "max_similarity"FROM "illustrations_illustration"LEFT OUTER JOIN "illustrations_illustration_tags" ON ("illustrations_illustration"."id" = "illustrations_illustration_tags"."illustration_id")LEFT OUTER JOIN "illustrations_tag" ON ("illustrations_illustration_tags"."tag_id" = "illustrations_tag"."id")GROUP BY "illustrations_illustration"."id", SIMILARITY("illustrations_illustration"."name", 'Animal')

You can use the max_similarity annotation to filter or order your results.