Find max value from column that has a json object with key-value pairs
Presto's UNNEST
performance got improved in Presto 316. However, you don't need UNNEST
in this case.
You can
- convert your JSON to arary of key/value pairs using JSON CAST and
map_entries
reduce
the array to pick the key for highest value- since key/value pairs are represented as anonymous
row
elements, it's very convenient to use positional access torow
elements with subscript operator, (available since Presto 314)
- since key/value pairs are represented as anonymous
Use query like
SELECT id, reduce( -- conver JSON to array of key/value pairs map_entries(CAST(data AS map(varchar, double))), -- initial state for reduce (must be same type as key/value pairs) (CAST(NULL AS varchar), -1e0), -- assuming your values cannot be negative -- reduction function (state, element) -> if(state[2] > element[2], state, element), -- reduce output function state -> state[1] ) AS topFROM (VALUES (1, JSON '{"Item1":7.3, "Item2":1.3, "Item3":9.8}'), (4, JSON '{"Item6":0.9, "Item7":11.2, "Item4":8.1}'), (5, JSON '{}'), (6, NULL)) t(id, data);
Output
id | top----+------- 1 | Item3 4 | Item7 5 | NULL 6 | NULL(4 rows)
Store the values one per row in a child table.
CREATE TABLE child ( id INT NOT NULL, item VARCHAR(6) NOT NULL, value DECIMAL(9,1), PRIMARY KEY (id, item));
You don't have to do a join to find the largest per group, just use a window function:
WITH cte AS ( SELECT id, item, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value DESC) AS rownum FROM mytable)SELECT * FROM cte WHERE rownum = 1;
Solving this with JSON is a bad idea. It makes your table denormalized, it makes the queries harder to design, and I predict it will make the query performance worse.