SQL LIMIT vs. JDBC Statement setMaxRows. Which one is better?
SQL-level LIMIT
To restrict the SQL query result set size, you can use the SQL:008 syntax:
SELECT titleFROM postORDER BY created_on DESCOFFSET 50 ROWSFETCH NEXT 50 ROWS ONLY
which works on Oracle 12, SQL Server 2012, or PostgreSQL 8.4 or newer versions.
For MySQL, you can use the LIMIT and OFFSET clauses:
SELECT titleFROM postORDER BY created_on DESCLIMIT 50OFFSET 50
The advantage of using the SQL-level pagination is that the database execution plan can use this information.
So, if we have an index on the created_on
column:
CREATE INDEX idx_post_created_on ON post (created_on DESC)
And we execute the following query that uses the LIMIT
clause:
EXPLAIN ANALYZESELECT titleFROM postORDER BY created_on DESCLIMIT 50
We can see that the database engine uses the index since the optimizer knows that only 50 records are to be fetched:
Execution plan:Limit (cost=0.28..25.35 rows=50 width=564) (actual time=0.038..0.051 rows=50 loops=1) -> Index Scan using idx_post_created_on on post p (cost=0.28..260.04 rows=518 width=564) (actual time=0.037..0.049 rows=50 loops=1)Planning time: 1.511 msExecution time: 0.148 ms
JDBC Statement maxRows
According to the setMaxRows
Javadoc:
If the limit is exceeded, the excess rows are silently dropped.
That's not very reassuring!
So, if we execute the following query on PostgreSQL:
try (PreparedStatement statement = connection .prepareStatement(""" SELECT title FROM post ORDER BY created_on DESC """)) { statement.setMaxRows(50); ResultSet resultSet = statement.executeQuery(); int count = 0; while (resultSet.next()) { String title = resultSet.getString(1); count++; }}
We get the following execution plan in the PostgreSQL log:
Execution plan: Sort (cost=65.53..66.83 rows=518 width=564) (actual time=4.339..5.473 rows=5000 loops=1) Sort Key: created_on DESC Sort Method: quicksort Memory: 896kB -> Seq Scan on post p (cost=0.00..42.18 rows=518 width=564) (actual time=0.041..1.833 rows=5000 loops=1)Planning time: 1.840 msExecution time: 6.611 ms
Because the database optimizer has no idea that we need to fetch only 50 records, it assumes that all 5000 rows need to be scanned. If a query needs to fetch a large number of records, the cost of a full-table scan is actually lower than if an index is used, hence the execution plan will not use the index at all.
I ran this test on Oracle, SQL Server, PostgreSQL, and MySQL, and it looks like the Oracle and PostgreSQL optimizers don't use the
maxRows
setting when generating the execution plan.However, on SQL Server and MySQL, the
maxRows
JDBC setting is taken into consideration, and the execution plan is equivalent to an SQL query that usesTOP
orLIMIT
. You can run the tests for yourself, as they are available in my High-Performance Java Persistence GitHub repository.
Conclusion
Although it looks like the setMaxRows
is a portable solution to limit the size of the ResultSet
, the SQL-level pagination is much more efficient if the database server optimizer doesn't use the JDBC maxRows
property.
The advantage of setmaxrows is that you can create universal statements, valid in Postgres, Oracle, Mysql etcAs Oracle is using rownum syntax, postgres - limit, msqsql - top
Speedwise it seems like no difference.
For most cases, you want to use the LIMIT
clause, but at the end of the day both will achieve what you want. This answer is targeted at JDBC and PostgreSQL, but is applicable to other languages and databases that use a similar model.
The JDBC documentation for Statement.setMaxRows
says
If the limit is exceeded, the excess rows are silently dropped.
i.e. The database server may return more rows but the client will just ignore them. The PostgreSQL JDBC driver limits on both the client and server side. For the client side, have a look at the usage of maxRows
in the AbstractJdbc2ResultSet
. For the server side, have a look of maxRows
in QueryExecutorImpl
.
Server side, the PostgreSQL LIMIT
documentation says:
The query optimizer takes LIMIT into account when generating a query plan
So as long as the query is sensible, it will load only the data it needs to fulfill the query.