Sqlite subselect much faster than distinct + order by Sqlite subselect much faster than distinct + order by sqlite sqlite

Sqlite subselect much faster than distinct + order by


Your first query orders the records first by inserting all of them into a sorted temporary table, and then implements the DISTINCT by going through them and returning only those that are not identical to the previous one.(This can be seen in the EXPLAIN output shown below; the DISTINCT actually got converted to a GROUP BY, which behaves the same.)

Your second query is, in theory, identical to the first, but SQLite's query optimizer is rather simple and cannot prove that this conversion would be safe (as explained in the subquery flattening documentation).Therefore, it is implemented by doing the DISTINCT first, by inserting only any non-duplicates into a temporary table, and then doing the ORDER BY with a second temporary table.This second step is completely superfluous because the first temp table was already sorted, but this happens to be faster for your data anyway because you have so many duplicates that are never stored in either temp table.

In theory, your first query could be faster, because SQLite has already recognized that the DISTINCT and ORDER BY clauses can be implemented with the same sorted temporary table.In practice, however, SQLite it is not smart enough to remember that the DISTINCT implies that duplicates do not need to be stored in the temp table.(This particular optimization might be added to SQLite if you ask nicely on the mailing list.)


$ sqlite3 mydb sqlite> .explainsqlite> explain SELECT DISTINCT acolumn FROM atable ORDER BY acolumn;addr  opcode         p1    p2    p3    p4             p5  comment      ----  -------------  ----  ----  ----  -------------  --  -------------0     Trace          0     0     0                    00               1     SorterOpen     1     2     0     keyinfo(1,BINARY)  00               2     Integer        0     3     0                    00  clear abort flag3     Integer        0     2     0                    00  indicate accumulator empty4     Null           0     6     6                    00               5     Gosub          5     37    0                    00               6     Goto           0     40    0                    00               7     OpenRead       0     2     0     1              00  atable       8     Rewind         0     14    0                    00               9     Column         0     0     8                    00  atable.acolumn10    Sequence       1     9     0                    00               11    MakeRecord     8     2     10                   00               12    SorterInsert   1     10    0                    00               13    Next           0     9     0                    01               14    Close          0     0     0                    00               15    OpenPseudo     2     10    2                    00               16    SorterSort     1     39    0                    00  GROUP BY sort17    SorterData     1     10    0                    00               18    Column         2     0     7                    20               19    Compare        6     7     1     keyinfo(1,BINARY)  00               20    Jump           21    25    21                   00               21    Move           7     6     0                    00               22    Gosub          4     32    0                    00  output one row23    IfPos          3     39    0                    00  check abort flag24    Gosub          5     37    0                    00  reset accumulator25    Column         2     0     1                    00               26    Integer        1     2     0                    00  indicate data in accumulator27    SorterNext     1     17    0                    00               28    Gosub          4     32    0                    00  output final row29    Goto           0     39    0                    00               30    Integer        1     3     0                    00  set abort flag31    Return         4     0     0                    00               32    IfPos          2     34    0                    00  Groupby result generator entry point33    Return         4     0     0                    00               34    Copy           1     11    0                    00               35    ResultRow      11    1     0                    00               36    Return         4     0     0                    00  end groupby result generator37    Null           0     1     0                    00               38    Return         5     0     0                    00               39    Halt           0     0     0                    00               40    Transaction    0     0     0                    00               41    VerifyCookie   0     2     0                    00               42    TableLock      0     2     0     atable         00               43    Goto           0     7     0                    00               
sqlite> explain SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable) ORDER BY acolumn;addr  opcode         p1    p2    p3    p4             p5  comment      ----  -------------  ----  ----  ----  -------------  --  -------------0     Trace          0     0     0                    00               1     Goto           0     39    0                    00               2     Goto           0     17    0                    00               3     OpenPseudo     0     3     1                    01  coroutine for sqlite_subquery_DA7480_4     Integer        0     2     0                    01               5     OpenEphemeral  2     0     0     keyinfo(1,BINARY)  08               6     OpenRead       1     2     0     1              00  atable       7     Rewind         1     14    0                    00               8     Column         1     0     3                    00  atable.acolumn9     Found          2     13    3     1              00               10    MakeRecord     3     1     4                    00               11    IdxInsert      2     4     0                    00               12    Yield          1     0     0                    00               13    Next           1     8     0                    01               14    Close          1     0     0                    00               15    Integer        1     2     0                    00               16    Yield          1     0     0                    00  end sqlite_subquery_DA7480_17    SorterOpen     3     3     0     keyinfo(1,BINARY)  00               18    Integer        2     1     0                    00               19    Yield          1     0     0                    00  next row of co-routine sqlite_subquery_DA7480_20    If             2     29    0                    00               21    Column         0     0     5                    00  sqlite_subquery_DA7480_.acolumn22    MakeRecord     5     1     6                    00               23    Column         0     0     7                    00  sqlite_subquery_DA7480_.acolumn24    Sequence       3     8     0                    00               25    Move           6     9     0                    00               26    MakeRecord     7     3     10                   00               27    SorterInsert   3     10    0                    00               28    Goto           0     19    0                    00               29    OpenPseudo     4     6     1                    00               30    OpenPseudo     5     11    3                    00               31    SorterSort     3     37    0                    00               32    SorterData     3     11    0                    00               33    Column         5     2     6                    20               34    Column         4     0     5                    20               35    ResultRow      5     1     0                    00               36    SorterNext     3     32    0                    00               37    Close          4     0     0                    00               38    Halt           0     0     0                    00               39    Transaction    0     0     0                    00               40    VerifyCookie   0     2     0                    00               41    TableLock      0     2     0     atable         00               42    Goto           0     2     0                    00               


Inside most DBMS, SQL statements are translated into relational algebra and then structured in an expression tree.The dbms then uses heuristics to optimise queries. One of the main heuristics is "Perform selection early" (p.46). I suppose the sqlite query planner does this as well, hence the differences in execution time.

Since the result of the subquery is much smaller (~50 rows opposed to 4.5 million), sorting, at the end of the expression tree, happens much faster. (Plain) Selecting isn't a very expensive process, running operations on a multitude of results is indeed.


I believe this must be because the order operation and distinct operations must be implemented more efficiently when separated by the subselect - which is effectively a simpler way to say way alexdeloy is saying.

This experiment is not complete. Please also run the following:

% echo "SELECT acolumn FROM (SELECT DISTINCT acolumn FROM atable ORDER BY acolumn) ;" | time sqlite3 mydb

Tell me if this takes longer than the other two on average and thanks.