Trying to find the second largest value in a column (postgres sql) Trying to find the second largest value in a column (postgres sql) postgresql postgresql

Trying to find the second largest value in a column (postgres sql)


Here's another conceptually simple solution that's been running for me in .1 milliseconds on a table of 21 million rows, according to EXPLAIN ANALYZE. It returns nothing in the case where there's only one value.

SELECT a.name, (SELECT word FROM apple ap WHERE ap.name=a.name ORDER BY word ASC OFFSET 1 LIMIT 1) FROM apple a

Note that my table already had existing indices on name, word, and (name, word), which allows me to use ORDER BY like that.


The simplest, albeit inefficient(array can exhaust memory):

select student, (array_agg(grade order by grade desc))[2]from student_gradesgroup by student

The efficient one:

create aggregate two_elements(anyelement)(sfunc = array_limit_two,stype = anyarray,initcond = '{}');create or replace function array_limit_two(anyarray, anyelement) returns anyarrayas $$begin    if array_upper($1,1) = 2 then        return $1;    else        return array_append($1, $2);    end if;end;$$ language 'plpgsql';

Test data:

create table student_grades(student text,grade int);insert into student_grades values ('john',70),('john',80),('john',90),('john',100);insert into student_grades values('paul',20),('paul',10),('paul',50),('paul',30);insert into student_grades values('george',40);

Test code:

-- second largestselect student, coalesce( (two_elements(grade order by grade desc))[2], max(grade) /* min would do too, since it's one element only */ )from student_gradesgroup by student-- second smallestselect student, coalesce( (two_elements(grade order by grade))[2], max(grade) /* min would do too, since it's one element only */ )from student_gradesgroup by student

Output:

q_and_a=# -- second largestq_and_a=# select student, coalesce( (two_elements(grade order by grade desc))[2], max(grade) /* min would do too, since it's one element only */ )q_and_a-# fromq_and_a-# student_gradesq_and_a-# group by student; student | coalesce---------+---------- george  |       40 john    |       90 paul    |       30(3 rows)q_and_a=#q_and_a=# -- second smallestq_and_a=# select student, coalesce( (two_elements(grade order by grade))[2], max(grade) /* min would do too, since it's one element only */ )q_and_a-# fromq_and_a-# student_gradesq_and_a-# group by student; student | coalesce---------+---------- george  |       40 john    |       80 paul    |       20(3 rows)

EDIT@diesel The simplest(and efficient also):

-- second largestselect student, array_min(two_elements(grade order by grade desc))from student_gradesgroup by student;-- second smallestselect student, array_max(two_elements(grade order by grade))from student_gradesgroup by student;

The array_max function:

create or replace function array_min(anyarray) returns anyelementas$$select min(unnested) from( select unnest($1) unnested ) as x$$ language sql;create or replace function array_max(anyarray) returns anyelementas$$select max(unnested) from( select unnest($1) unnested ) as x$$ language sql;

EDIT

Could be the simplest and efficient of all, if only Postgresql would make array_max a built-in function and facilitates LIMIT clause on aggregations :-) LIMIT clause on aggregation is my dream feature on Postgresql

select student, array_max( array_agg(grade order by grade limit 2) )from student_gradesgroup by student;

While that LIMIT on aggregation is not yet available, use this:

-- second largestselect student,     array_min    (        array (                select grade from student_grades                where student = x.student order by grade desc limit 2 )    )from student_grades xgroup by student;-- second smallestselect student,     array_max    (        array (                select grade from student_grades                where student = x.student order by grade limit 2 )    )from student_grades xgroup by student;


This is also brute force, but is guaranteed to only pass the table exactly and only once:

select name,word  from (         select name,word              , row_number() over (partition by name                                        order by word desc)                as rowNum           from apple       ) x where rowNum = 2

This version below may perform better if you have a covering index on (name,word) and there is a high count of word values per name:

with recursive myCte as( select name,max(word) as word      , 1 as rowNum   from apple  group by name  union all select par.name      , (select max(word) as word           from apple           where name = par.name            AND word < par.word        ) as word      , 2 as rowNum   from myCte par  where par.rowNum = 1)select * from myCte where rownum = 2