Why is the same SQLite query being 30 times slower when fetching only twice as many results? Why is the same SQLite query being 30 times slower when fetching only twice as many results? sqlite sqlite

Why is the same SQLite query being 30 times slower when fetching only twice as many results?


The execution time geometrically proportional to the number of rows in each table rather than arithmetically e.g.

3 tables with 10 rows each => 1,000 comparision3 tables with 10, 10 and 40 rows => 4,000 comparisons3 tables with 20 rows each => 8,000 comparisons

You could probably re-factor the query to avoid some of the joins/cursors - when do you need an answer?

Could you do something like this:

SELECT precursor_id, feature_table_id FROM MSMS_precursorINNER JOIN     (        SELECT mzMin, mzMax, rtMin, rtMax, spectrum_id, feature_table_id, msrun_msrun_id        FROM spectrum        INNER JOIN            (select feature_table_id, mzMin, mzMax, rtMin, rtMax, msrun_msrun_id            from feature            where feature.msrun_msrun_id = 'value'           ) subquery         ON subquery.msrun_msrun_id = spectrum.msrun_msrun_id        WHERE             spectrum.scan_start_time BETWEEN subquery.rtMin AND subquery.rtMax     ) subquery    ON subquery.spectrum_id = MSMS_precursor.spectrum_spectrum_id WHERE     MSMS_precursor.ion_mz BETWEEN subquery.mzMin AND subquery.mzMax 

Using a subquery enables you to reduce the number of comparisons between the tables - you can quickly filter out the unwanted features, then the un-related spectra before searching for suitable precursors.

I don't use SQLLite - but the principle should still apply.

UPDATED : fixed bug in SQL

Notes:

You don't have to worry about the ANDs, you'll only get:

  • features where feature.msrun_msrun_id = 'value'
  • spectra for those features and where spectrum.scan_start_time BETWEENsubquery.rtMin AND subquery.rtMax
  • precursors for those spectrs and where MSMS_precursor.ion_mz BETWEEN subquery.mzMin ANDsubquery.mzMax

UPDATE 18/May:

It's the indexing!!! you have indexes on the search fields, but not on the fields participating in the joins - foreign key indices really boost performance:

CREATE INDEX `fk_msrun_msrun_id_feature` ON `feature` (`msrun_msrun_id` ASC); CREATE INDEX `fk_spectrum_spectrum_id_feature` ON `feature` (`msrun_msrun_id` ASC); CREATE INDEX `fk_spectrum_spectrum_id_MSMS_precursor` ON `MSMS_precursor` (`spectrum_spectrum_id` ASC); 


I suggest you try using an R*Tree index, They're designed for efficient range queries.


I haven't actually used R*Tree much, just read the documentation, but I think you might be using it incorrectly. You may want to try changing your query to use

WHERE convexhull_edges.rtMin <= spectrum.scan_start_time AND convexhull_edges.rtMax >= spectrum.scan_start_time ANDconvexhull_edges.mzMin <= MSMS_precursor.ion_mz AND convexhull_edges.mzMax >= MSMS_precursor.ion_mz

which should be equivalent to your current query, but I think should be faster (You should be picking a range out of the R*Tree, rather than comparing the point to the range)


Consider using covering indices on the tables involved in your query.

You are indeed fetching a limited amount of columns in your select statement and the corresponding inner join and where clauses. By using a covering index with the columns well ordered in it, you should get a very fast query, that is you will remove the scan table in favor of an search table using a covering index.

Try to use those indices on your tables:

CREATE INDEX `fk_covering_feature` ON `feature` (`msrun_msrun_id`,`mzMin`,`mzMax`,`rtMin`,`rtMax`,`feature_table_id`);CREATE INDEX `fk_covering_spectrum` ON  `spectrum` (`msrun_msrun_id`,`scan_start_time`,`spectrum_id`);CREATE INDEX `fk_covering_MSMS_precursor` ON  `MSMS_precursor` (`spectrum_spectrum_id`,`ion_mz`,`precursor_id`);

When going for speed, you should also hint the query planner to understand msrun_msrun_id is a constant to check for both feature and spectrum tables. Add the constant test in your query by putting this additional test at the end of the query (and pass spectrumFeature_InputValues twice):

"AND spectrum.msrun_msrun_id = ?"