Nested Query Alternatives in AWS Athena Nested Query Alternatives in AWS Athena sql sql

Nested Query Alternatives in AWS Athena


There may be a better way to do this - I would be curious to see it too! One way I can think of would be to use an outer join. (I'm not exactly sure about how your data is structured, so forgive the contrived example, but I hope it would translate ok.) How about this?

with   a as (select *        from (values        (1,'cookie_n',10,'cookie_2'),        (2,'cookie_n',11,'cookie_1'),       (3,'cookie_m',12,'cookie_1'),       (4,'cookie_m',12,'cookie_1'),       (5,'cookie_q',13,'cookie_1'),       (6,'cookie_n',13,'cookie_1'),       (7,'cookie_m',14,'cookie_3')       ) as db_ids(first_party_id, first_party_type, third_party_id, third_party_type)      ),  b as (select first_party_type         from a where third_party_type = 'cookie_2'),  c as (select a.third_party_id, b.first_party_type as exclude_first_party_type         from a left join b on a.first_party_type = b.first_party_type         where a.third_party_type = 'cookie_1')select count(distinct third_party_id) from c        where exclude_first_party_type is null;

Hope this helps!


You can use an outer join:

SELECT          COUNT(DISTINCT i.third_party_id) AS uniquesFROM          db.ids aLEFT JOIN         db.ids bON       a.first_party_id = b.first_party_id     AND b.third_party_id = 'cookie_2'WHERE         a.third_party_type = 'cookie_1'     AND b.third_party_id is null -- this line means we select only rows where there is no match

You should also use caution when using NOT IN for subqueries that may return NULL values since the condition will always be true. Your query is comparing a.first_party_id to NULL, which will always be false and so NOT IN will lead to the condition always being true. Nasty little gotcha.

One way to avoid this is to avoid using NOT IN or to add a condition to your subquery i.e. AND third_party_id IS NOT NULL.

See here for a longer explanation.