Need help with sql query to find things tagged with all specified tags Need help with sql query to find things tagged with all specified tags sql sql

Need help with sql query to find things tagged with all specified tags


Using IN:

SELECT p.*  FROM POSTS p WHERE p.id IN (SELECT tg.post_id                  FROM TAGGINGS tg                  JOIN TAGS t ON t.id = tg.tag_id                 WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')              GROUP BY tg.post_id                HAVING COUNT(DISTINCT t.name) = 7)

Using a JOIN

SELECT p.*  FROM POSTS p  JOIN (SELECT tg.post_id          FROM TAGGINGS tg          JOIN TAGS t ON t.id = tg.tag_id         WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')      GROUP BY tg.post_id        HAVING COUNT(DISTINCT t.name) = 7) x ON x.post_id = p.id

Using EXISTS

SELECT p.*  FROM POSTS p WHERE EXISTS (SELECT NULL                 FROM TAGGINGS tg                 JOIN TAGS t ON t.id = tg.tag_id                WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')                  AND tg.post_id = p.id             GROUP BY tg.post_id               HAVING COUNT(DISTINCT t.name) = 7)

Explanation

The crux of things is that the COUNT(DISTINCT t.name) needs to match the number of tag names to ensure that all those tags are related to the post. Without the DISTINCT, there's a risk that duplicates of one of the names could return a count of 7--so you'd have a false positive.

Performance

Most will tell you the JOIN is optimal, but JOINs also risk duplicating rows in the resultset. EXISTS would be my next choice--no duplicate risk, and generally faster execution but checking the explain plan will ultimately tell you what's best based on your setup and data.


Try this:

Select * From Posts p   Where Not Exists       (Select * From tags t        Where name in            ('Cheese', 'Wine', 'Paris',              'Frace', 'City', 'Scenic', 'Art')           And Not Exists             (Select * From taggings              Where tag_id = t.Tag_Id                And post_Id = p.Post_Id))

Explanation: Asking for a list of those Posts that have had every one of a specified set of tags associated with it is equivilent to asking for those posts where there is no tag in that same specified set, that has not been associated with it. i.e., the sql above.