Django max similarity (TrigramSimilarity) from ManyToManyField

python django postgresql django-queryset trigram

You cannot break up the tags__name (at least I don't know a way).
From your examples, I can assume 2 possible solutions (1st solution is not strictly using Django):

Not everything needs to pass strictly through Django
We have Python powers, so let's use them:

Let us compose the query first:

from difflib import SequenceMatcherfrom django.db.models import Qdef create_query(fulltext):    illustration_names = Illustration.objects.values_list('name', flat=True)    tag_names = Tag.objects.values_list('name', flat=True)    query = []    for name in illustration_names:        score = SequenceMatcher(None, name, fulltext).ratio()        if score == 1:            # Perfect Match for name            return [Q(name=name)]         if score >= THRESHOLD:            query.append(Q(name=name))    for name in tag_names:        score = SequenceMatcher(None, name, fulltext).ratio()        if score == 1:            # Perfect Match for name            return [Q(tags__name=name)]         if score >= THRESHOLD:            query.append(Q(tags__name=name))    return query

Then to create your queryset:

from functools import reduce # Needed only in python 3from operator import or_queryset = Illustration.objects.filter(reduce(or_, create_query(fulltext)))

Decode the above:

We are checking every Illustration and Tag name against our fulltext and we are composing a query with every name that it's similarity passes the THRESHOLD.

SequenceMatcher method compares sequences and returns a ratio 0 < ratio < 1 where 0 indicates No-Match and 1 indicates Perfect-Match. Check this answer for another usage example: Find the similarity percent between two strings (Note: There are other strings comparing modules as well, find one that suits you)
Q() Django objects, allow the creation of complex queries (more on the linked docs).
With the operator and reduce we transform a list of Q() objects to an OR separated query argument:
Q(name=name_1) | Q(name=name_2) | ... | Q(tag_name=tag_name_1) | ...

Note:You need to define an acceptable THRESHOLD.
As you can imagine this will be a bit slow but it is to be expected when you need to do a "fuzzy" search.

(The Django Way:)
Use a query with a high similarity threshold and order the queryset by this similarity rate:
```
queryset.annotate(    similarity=Greatest(        TrigramSimilarity('name', fulltext),         TrigramSimilarity('tags__name', fulltext)    )).filter(similarity__gte=threshold).order_by('-similarity')
```
Decode the above:
- Greatest() accepts an aggregation (not to be confused with the Django method aggregate) of expressions or of model fields and returns the max item.
- TrigramSimilarity(word, search) returns a rate between 0 and 1. The closer the rate is to 1, the more similar the word is to search.
- .filter(similarity__gte=threshold), will filter similarities lower than the threshold.
- 0 < threshold < 1. You can set the threshold to 0.6 which is pretty high (consider that the default is 0.3). You can play around with that to tune your performance.
- Finally, order the queryset by the similarity rate in a descending order.

python django postgresql django-queryset trigram

I solved it using only TrigramSimilarity, Max and Greatest.

I populated some data as in your question:

from illustrations.models import Illustration, TagTag.objects.bulk_create([Tag(name=t) for t in ['Animal', 'Brown', 'Animals']])Illustration.objects.bulk_create([Illustration(name=t) for t in ['Dog', 'Cat']])dog=Illustration.objects.get(name='Dog')cat=Illustration.objects.get(name='Cat')animal=Tag.objects.get(name='Animal')brown=Tag.objects.get(name='Brown')animals=Tag.objects.get(name='Animals')dog.tags.add(animal, brown)cat.tags.add(animals)

I imported all necessary functions and initialized fulltext:

from illustrations.models import Illustrationfrom django.contrib.postgres.search import TrigramSimilarityfrom django.db.models.functions import Greatestfrom django.db.models import Maxfulltext = 'Animal'

Then I executed the query:

Illustration.objects.annotate(    max_similarity=Greatest(        Max(TrigramSimilarity('tags__name', fulltext)),        TrigramSimilarity('name', fulltext)    )).values('name', 'max_similarity')

With this results:

<QuerySet [{'name': 'Dog', 'max_similarity': 1.0}, {'name': 'Cat', 'max_similarity': 0.666667}]>

This is the SQL query exceuted from PostgreSQL:

SELECT "illustrations_illustration"."name", GREATEST(MAX(SIMILARITY("illustrations_tag"."name", 'Animal')), SIMILARITY("illustrations_illustration"."name", 'Animal')) AS "max_similarity"FROM "illustrations_illustration"LEFT OUTER JOIN "illustrations_illustration_tags" ON ("illustrations_illustration"."id" = "illustrations_illustration_tags"."illustration_id")LEFT OUTER JOIN "illustrations_tag" ON ("illustrations_illustration_tags"."tag_id" = "illustrations_tag"."id")GROUP BY "illustrations_illustration"."id", SIMILARITY("illustrations_illustration"."name", 'Animal')

You can use the max_similarity annotation to filter or order your results.

CodeHunter

Django max similarity (TrigramSimilarity) from ManyToManyField

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last