Django query filter using large array of ids in Postgres DB Django query filter using large array of ids in Postgres DB arrays arrays

Django query filter using large array of ids in Postgres DB


I found a solution building on @erwin-brandstetter answer using a custom lookup

from django.db.models import Lookupfrom django.db.models.fields import Field@Field.register_lookupclass EfficientInLookup(Lookup):    lookup_name = "ineff"    def as_sql(self, compiler, connection):        lhs, lhs_params = self.process_lhs(compiler, connection)        rhs, rhs_params = self.process_rhs(compiler, connection)        params = lhs_params + rhs_params        return "%s IN (SELECT unnest(%s))" % (lhs, rhs), params

This allows to filter like this:

MyModel.objects.filter(id__ineff=<list-of-values>)


The trick is to transform the array to a set somehow.

Instead of (this form is only good for a short array):

SELECT *FROM   tbl tWHERE  t.tbl_id = ANY($1);-- WHERE  t.tbl_id IN($1);  -- equivalent

$1 being the array parameter.

You can still pass an array like you had it, but unnest and join. Like:

SELECT *FROM   tbl tJOIN   unnest($1) arr(id) ON arr.id = t.tbl_id;

Or you can keep your query, too, but replace the array with a subquery unnesting it:

SELECT * FROM tbl tWHERE  t.tbl_id = ANY (SELECT unnest($1));

Or:

SELECT * FROM tbl tWHERE  t.tbl_id IN    (SELECT unnest($1));

Same effect for performance as passing a set with a VALUES expression. But passing the array is typically much simpler.

Detailed explanation:


Is this an example of the first thing you're asking?

relation_list = list(ModelA.objects.filter(id__gt=100))obj_query = ModelB.objects.filter(a_relation__in=relation_list)

That would be an "IN" command because you're first evaluating relation_list by casting it to a list, and then using it in your second query.

If instead you do the exact same thing, Django will only make one query, and do SQL optimization for you. So it should be more efficient that way.

You can always see the SQL command you'll be executing with obj_query.query if you're curious what's happening under the hood.

Hope that answers the question, sorry if it doesn't.