Select at least one from each category? Select at least one from each category? sqlite sqlite

Select at least one from each category?


The key to the answer is that there are two kinds of questions in the result: for each category, one question that must be constrained to come from that category; and some remaining questions.

First, the constrained questions: we just select one record from each category:

SELECT id, category_id, question_text, 1 AS constrained, max(random()) AS rFROM so_questionsGROUP BY category_id

(This query relies on a feature introduced in SQLite 3.7.11 (in Jelly Bean or later): in a query SELECT a, max(b), the value of a is guaranteed to come from the record that has the maximum b value.)

We also have to get the non-constrained questions (filtering out the duplicates that are already in the constrained set will happen in the next step):

SELECT id, category_id, question_text, 0 AS constrained, random() AS rFROM so_questions

When we combine these two queries with UNION and then group by the id, we have all the duplicates together. Selecting max(constrained) then ensures that for the groups that have duplicates, only the constrained question remains (while all the other questions have only one record per group anyway).

Finally, the ORDER BY clause ensures that the constrained questions come first, followed by some random other questions:

SELECT *, max(constrained)FROM (SELECT id, category_id, question_text, 1 AS constrained, max(random()) AS r      FROM so_questions      GROUP BY category_id      UNION ALL      SELECT id, category_id, question_text, 0 AS constrained, random() AS r      FROM so_questions)GROUP BY idORDER BY constrained DESC, rLIMIT 5

For earlier SQLite/Android versions, I haven't found a solution without using a temporary table (because the subquery for the constrained question must be used multiple times, but does not stay constant because of the random()):

BEGIN TRANSACTION;CREATE TEMPORARY TABLE constrained ASSELECT (SELECT id        FROM so_questions        WHERE category_id = cats.category_id        ORDER BY random()        LIMIT 1) AS idFROM (SELECT DISTINCT category_id      FROM so_questions) AS cats;SELECT ids.id, category_id, question_textFROM (SELECT id      FROM (SELECT id, 1 AS c            FROM constrained            UNION ALL            SELECT id, 0 AS c            FROM so_questions            WHERE id NOT IN (SELECT id FROM constrained))      ORDER BY c DESC, random()      LIMIT 5) AS idsJOIN so_questions ON ids.id = so_questions.id;DROP TABLE constrained;COMMIT TRANSACTION;


Basically what you are looking for is select top N max values. I spend 3-4 hours in the morning for searching it. ( still i haven't success in it, you may need to wait few more hours ).

For the temporary solution you can use group by option as follows,

String strQuery = "SELECT * FROM so_questions group by category_id;";

the output is as follows,

enter image description here

will be back with exact your requirement.


Since it's sqlite (thus local): How slow would it be to just query until you have 5 answers and four different categories, dropping the duplicate category rows each iteration.

I think, if each category is equally represented, that it would be highly unlikely that you need more than 3 iterations which should still be below a second.

It's not algorithmically nice, but to me using random() in a SQL statement isn't algorithmically nice anyway.