SQL ranking query to compute ranks and median in sub groups

I suggest doing the computing in your programming language:

for each group:  for each record_in_group:    append y to array  median of array

But if you are stuck with SQLite, you can order each group by y and select the records in the middle like this http://sqlfiddle.com/#!5/d4c68/55/0:

UPDATE: only bigger "median" value is importand for even nr. of rows, so no avg() is needed:

select groups.gid,  ids.y medianfrom (  -- get middle row number in each group (bigger number if even nr. of rows)  -- note the integer divisions and modulo operator  select round(x) gid,    count(*) / 2 + 1 mid_row_right  from xy_table  group by round(x)) groupsjoin (  -- for each record get equivalent of  -- row_number() over(partition by gid order by y)  select round(a.x) gid,    a.x,    a.y,    count(*) rownr_by_y  from xy_table a  left join xy_table b    on round(a.x) = round (b.x)    and a.y >= b.y  group by a.x) ids on ids.gid = groups.gidwhere ids.rownr_by_y = groups.mid_row_right

sql sqlite group-by ranking median

OK, this relies on a temporary table:

create temporary table tmp (x float, y float);insert into tmp  select * from xy_table order by round(x), y

But you could potentially create this for a range of data you were interested in. Another way would be to ensure the xy_table had this sort order, instead of just ordering on x. The reason for this is SQLite's lack of row numbering capability.

Then:

select tmp4.x as gid, t.* from (  select tmp1.x,          round((tmp2.y + coalesce(tmp3.y, tmp2.y)) / 2) as y -- <- for larger of the two, change to: (case when tmp2.y > coalesce(tmp3.y, 0) then tmp2.y else tmp3.y end)  from (    select round(x) as x, min(rowid) + (count(*) / 2) as id1,            (case when count(*) % 2 = 0 then min(rowid) + (count(*) / 2) - 1                  else 0 end) as id2    from (        select *, rowid from tmp    ) t    group by round(x)  ) tmp1  join tmp tmp2 on tmp1.id1 = tmp2.rowid  left join tmp tmp3 on tmp1.id2 = tmp3.rowid) tmp4join xy_table t on tmp4.x = round(t.x) and tmp4.y = t.y

If you wanted to treat the median as the larger of the two middle values, which doesn't fit the definition as @Aprillion already pointed out, then you would simply take the larger of the two y values, instead of their average, on the third line of the query.

CodeHunter

SQL ranking query to compute ranks and median in sub groups

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last