Nested Query Alternatives in AWS Athena
There may be a better way to do this - I would be curious to see it too! One way I can think of would be to use an outer join. (I'm not exactly sure about how your data is structured, so forgive the contrived example, but I hope it would translate ok.) How about this?
with a as (select * from (values (1,'cookie_n',10,'cookie_2'), (2,'cookie_n',11,'cookie_1'), (3,'cookie_m',12,'cookie_1'), (4,'cookie_m',12,'cookie_1'), (5,'cookie_q',13,'cookie_1'), (6,'cookie_n',13,'cookie_1'), (7,'cookie_m',14,'cookie_3') ) as db_ids(first_party_id, first_party_type, third_party_id, third_party_type) ), b as (select first_party_type from a where third_party_type = 'cookie_2'), c as (select a.third_party_id, b.first_party_type as exclude_first_party_type from a left join b on a.first_party_type = b.first_party_type where a.third_party_type = 'cookie_1')select count(distinct third_party_id) from c where exclude_first_party_type is null;
Hope this helps!
You can use an outer join:
SELECT COUNT(DISTINCT i.third_party_id) AS uniquesFROM db.ids aLEFT JOIN db.ids bON a.first_party_id = b.first_party_id AND b.third_party_id = 'cookie_2'WHERE a.third_party_type = 'cookie_1' AND b.third_party_id is null -- this line means we select only rows where there is no match
You should also use caution when using
NOT IN for subqueries that may return
NULL values since the condition will always be true. Your query is comparing
NULL, which will always be false and so
NOT IN will lead to the condition always being true. Nasty little gotcha.
One way to avoid this is to avoid using
NOT IN or to add a condition to your subquery i.e.
AND third_party_id IS NOT NULL.
See here for a longer explanation.