Improving OFFSET performance in PostgreSQL

You might want a computed index.

Let's create a table:

create table sales(day date, amount real);

And fill it with some random stuff:

insert into sales     select current_date + s.a as day, random()*100 as amount    from generate_series(1,20);

Index it by day, nothing special here:

create index sales_by_day on sales(day);

Create a row position function. There are other approaches, this one is the simplest:

create or replace function sales_pos (date) returns bigint    as 'select count(day) from sales where day <= $1;'    language sql immutable;

Check if it works (don't call it like this on large datasets though):

select sales_pos(day), day, amount from sales;     sales_pos |    day     |  amount      -----------+------------+----------             1 | 2011-07-08 |  41.6135             2 | 2011-07-09 |  19.0663             3 | 2011-07-10 |  12.3715    ..................

Now the tricky part: add another index computed on the sales_pos function values:

create index sales_by_pos on sales using btree(sales_pos(day));

Here is how you use it. 5 is your "offset", 10 is the "limit":

select * from sales where sales_pos(day) >= 5 and sales_pos(day) < 5+10;        day     | amount      ------------+---------     2011-07-12 | 94.3042     2011-07-13 | 12.9532     2011-07-14 | 74.7261    ...............

It is fast, because when you call it like this, Postgres uses precalculated values from the index:

explain select * from sales   where sales_pos(day) >= 5 and sales_pos(day) < 5+10;                                    QUERY PLAN                                    --------------------------------------------------------------------------     Index Scan using sales_by_pos on sales  (cost=0.50..8.77 rows=1 width=8)       Index Cond: ((sales_pos(day) >= 5) AND (sales_pos(day) < 15))

Hope it helps.

database postgresql query-optimization

I don't know anything about "counted b-tree indexes", but one thing we've done in our application to help with this is break our queries into two, possibly using a sub-query. My apologies for wasting your time if you're already doing this.

SELECT *FROM massive_tableWHERE id IN (    SELECT id    FROM massive_table    WHERE ...    LIMIT 50    OFFSET 500000);

The advantage here is that, while it still has to calculate the proper ordering of everything, it doesn't order the entire row--only the id column.

database postgresql query-optimization

Instead of using an OFFSET, a very efficient trick is to use a temporary table:

CREATE  TEMPORARY TABLE just_index ASSELECT ROW_NUMBER() OVER (ORDER BY myID), myIDFROM mytable;

For 10 000 000 rows it needs about 10s to be created.Then you want to use SELECT or UPDATE your table, you simply:

SELECT * FROM mytable INNER JOIN (SELECT just_index.myId FROM just_index WHERE row_number >= *your offset* LIMIT 1000000) indexes ON mytable.myID = indexes.myID

Filtering mytable with only just_index is more efficient (in my case) with a INNER JOIN than with a WHERE myID IN (SELECT ...)

This way you don't have to store the last myId value, you simply replace the offset with a WHERE clause, that uses indexes

CodeHunter

Improving OFFSET performance in PostgreSQL

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last