Django Full Text Search Optimization - Postgres Django Full Text Search Optimization - Postgres postgresql postgresql

Django Full Text Search Optimization - Postgres


As already suggested by @knbk for performance improvement you have to read the Full-text search Performance section in the Django documentation.

"If this approach becomes too slow, you can add a SearchVectorField to your model."

In your code you can add a search vector field in your model with a related GIN index and a queryset with a new method to update the field:

from django.contrib.postgres.indexes import GinIndexfrom django.contrib.postgres.search import SearchVector, SearchVectorFieldfrom django.db import modelsfrom postgres_copy import CopyQuerySetclass AddressesQuerySet(CopyQuerySet):    def update_search_vector(self):        return self.update(search_vector=SearchVector(            'number', 'street', 'unit', 'city', 'region', 'postcode'        ))class Addresses(models.Model):    date_update = models.DateTimeField(auto_now=True, null=True)    longitude = models.DecimalField(max_digits=9, decimal_places=6, null=True)    latitude = models.DecimalField(max_digits=9, decimal_places=6, null=True)    number = models.CharField(max_length=16, null=True, default='')    street = models.CharField(max_length=60, null=True, default='')    unit = models.CharField(max_length=50, null=True, default='')    city = models.CharField(max_length=50, null=True, default='')    district = models.CharField(max_length=10, null=True, default='')    region = models.CharField(max_length=5, null=True, default='')    postcode = models.CharField(max_length=5, null=True, default='')    addr_id = models.CharField(max_length=20, unique=True)    addr_hash = models.CharField(max_length=20, unique=True)    search_vector = SearchVectorField(null=True, editable=False)    objects = AddressesQuerySet.as_manager()    class Meta:        indexes = [            GinIndex(fields=['search_vector'], name='search_vector_idx')        ]

You can update your new search vector field using the new queryset method:

>>> Addresses.objects.update_search_vector()UPDATE "addresses_addresses"SET "search_vector" = to_tsvector(  COALESCE("addresses_addresses"."number", '') || ' ' ||  COALESCE("addresses_addresses"."street", '') || ' ' ||  COALESCE("addresses_addresses"."unit", '') || ' ' ||  COALESCE("addresses_addresses"."city", '') || ' ' ||  COALESCE("addresses_addresses"."region", '') || ' ' ||  COALESCE("addresses_addresses"."postcode", ''))

If you execute a query and read the explain you can see your GIN index used:

>>> print(Addresses.objects.filter(search_vector='north').values('id').explain(verbose=True))EXPLAIN (VERBOSE true)SELECT "addresses_addresses"."id"FROM "addresses_addresses"WHERE "addresses_addresses"."search_vector" @@ (plainto_tsquery('north')) = true [0.80ms]Bitmap Heap Scan on public.addresses_addresses  (cost=12.25..16.52 rows=1 width=4)  Output: id  Recheck Cond: (addresses_addresses.search_vector @@ plainto_tsquery('north'::text))  ->  Bitmap Index Scan on search_vector_idx  (cost=0.00..12.25 rows=1 width=0)        Index Cond: (addresses_addresses.search_vector @@ plainto_tsquery('north'::text))

If you want to deepen further you can read an article that I wrote on the subject:

"Full-Text Search in Django with PostgreSQL"

Update

I tried execute the SQL generate by Django ORM:http://sqlfiddle.com/#!17/f9aa9/1


You need to create a functional index on the search vector. Right now you have an index on the underlying fields, but it still has to create the search vector for every row before it can filter the results. That's why it's doing a sequential scan.

Django currently does not support functional indexes in Meta.indexes, so you need to create it manually, for example with a RunSQL operation.

RunSQL(    """    CREATE INDEX ON public_data_au_addresses USING GIN     (to_tsvector(...))    """)

The to_tsvector() expression has to match the expression used in your query. Be sure to read through the Postgres docs for all the details.