SQL ranking query to compute ranks and median in sub groups SQL ranking query to compute ranks and median in sub groups sqlite sqlite

SQL ranking query to compute ranks and median in sub groups


I suggest doing the computing in your programming language:

for each group:  for each record_in_group:    append y to array  median of array

But if you are stuck with SQLite, you can order each group by y and select the records in the middle like this http://sqlfiddle.com/#!5/d4c68/55/0:

UPDATE: only bigger "median" value is importand for even nr. of rows, so no avg() is needed:

select groups.gid,  ids.y medianfrom (  -- get middle row number in each group (bigger number if even nr. of rows)  -- note the integer divisions and modulo operator  select round(x) gid,    count(*) / 2 + 1 mid_row_right  from xy_table  group by round(x)) groupsjoin (  -- for each record get equivalent of  -- row_number() over(partition by gid order by y)  select round(a.x) gid,    a.x,    a.y,    count(*) rownr_by_y  from xy_table a  left join xy_table b    on round(a.x) = round (b.x)    and a.y >= b.y  group by a.x) ids on ids.gid = groups.gidwhere ids.rownr_by_y = groups.mid_row_right


OK, this relies on a temporary table:

create temporary table tmp (x float, y float);insert into tmp  select * from xy_table order by round(x), y

But you could potentially create this for a range of data you were interested in. Another way would be to ensure the xy_table had this sort order, instead of just ordering on x. The reason for this is SQLite's lack of row numbering capability.

Then:

select tmp4.x as gid, t.* from (  select tmp1.x,          round((tmp2.y + coalesce(tmp3.y, tmp2.y)) / 2) as y -- <- for larger of the two, change to: (case when tmp2.y > coalesce(tmp3.y, 0) then tmp2.y else tmp3.y end)  from (    select round(x) as x, min(rowid) + (count(*) / 2) as id1,            (case when count(*) % 2 = 0 then min(rowid) + (count(*) / 2) - 1                  else 0 end) as id2    from (        select *, rowid from tmp    ) t    group by round(x)  ) tmp1  join tmp tmp2 on tmp1.id1 = tmp2.rowid  left join tmp tmp3 on tmp1.id2 = tmp3.rowid) tmp4join xy_table t on tmp4.x = round(t.x) and tmp4.y = t.y

If you wanted to treat the median as the larger of the two middle values, which doesn't fit the definition as @Aprillion already pointed out, then you would simply take the larger of the two y values, instead of their average, on the third line of the query.