Filtering the Aggregate in the Django ORM Filtering the Aggregate in the Django ORM django django

Filtering the Aggregate in the Django ORM


OK, now that the question includes the model definitions, I submit to you that this should work, unless your version of Django doesn't support some feature I use here (in which case, please let me know!):

Post.objects.filter(thread__in=thread_set, status='active').aggregate(num_posts=Count('id'))

Django allows __in filters to take a QuerySet to decide what the IN clause should look like in SQL, so if you pass thread__in=thread_set, Django will filter the posts so that only those whose thread field points to one of the ids of the threads in your thread_set remain for the aggregate call to see.

This should filter the posts with just one db query with something like WHERE thread_id IN ... inside, rather than with one query per thread, which would indeed be horrid. If anything else happened, this would be a bug in Django...

The result should be at most two queries to establish a Category's postcount -- one to obtain thread_set and another one actually to count the posts. The alternative is to have a thread/post join to be filtered based on Thread's category field and Post's status field, which I wouldn't necessarily expect to be that much faster. (I say 'at most', because I guess they could be fused automatically... Though I don't think this would happen with current Django. Can't check ATM, sorry.)

EDIT: Django's QuerySet API reference says this on __in filters:


IN

In a given list.

Example:

Entry.objects.filter(id__in=[1, 3, 4])

SQL equivalent:

SELECT ... WHERE id IN (1, 3, 4);

You can also use a queryset to dynamically evaluate the list of values instead of providing a list of literal values:

inner_qs = Blog.objects.filter(name__contains='Cheddar')entries = Entry.objects.filter(blog__in=inner_qs)

This queryset will be evaluated as subselect statement:

SELECT ... WHERE blog.id IN (SELECT id FROM ... WHERE NAME LIKE '%Cheddar%')

The above code fragment could also be written as follows:

inner_q = Blog.objects.filter(name__contains='Cheddar').values('pk').queryentries = Entry.objects.filter(blog__in=inner_q)

Changed in Django 1.1: In Django 1.0, only the latter piece of code is valid.

This second form is a bit less readable and unnatural to write, since it accesses the internal query attribute and requires a ValuesQuerySet. If your code doesn't require compatibility with Django 1.0, use the first form, passing in a queryset directly.


So, I guess Django is capable of passing a single query to the db in the case at issue here. If the db's query analyser does a good job, the effect might be very nearly optimal. :-)


Yes. Just do it. This should work as expected:

self.thread_set.filter(active_status=1).aggregate(num_posts=Count('post'))['num_posts']

Any original query returns a QuerySet, so any available methods that return QuerySets can be can be pretty much indefinitely chained together for complex criteria matches. Since aggregate() does not return a QuerySet, you want to make sure that it is last in the chain.


I have been looking into something similar and have not found a great solution. I'm using something like this:

def post_count(self):        return len(Post.objects.filter(someModel = self).filter(active_status = 1))

It's not great, but I don't think that Django allows you to filter based on the secondary model aggregations and annotations. I'll be checking to see if anyone comes up with a better solution.