Why is the same SQLite query being 30 times slower when fetching only twice as many results?
The execution time geometrically proportional to the number of rows in each table rather than arithmetically e.g.
3 tables with 10 rows each => 1,000 comparision3 tables with 10, 10 and 40 rows => 4,000 comparisons3 tables with 20 rows each => 8,000 comparisons
You could probably re-factor the query to avoid some of the joins/cursors - when do you need an answer?
Could you do something like this:
SELECT precursor_id, feature_table_id FROM MSMS_precursorINNER JOIN ( SELECT mzMin, mzMax, rtMin, rtMax, spectrum_id, feature_table_id, msrun_msrun_id FROM spectrum INNER JOIN (select feature_table_id, mzMin, mzMax, rtMin, rtMax, msrun_msrun_id from feature where feature.msrun_msrun_id = 'value' ) subquery ON subquery.msrun_msrun_id = spectrum.msrun_msrun_id WHERE spectrum.scan_start_time BETWEEN subquery.rtMin AND subquery.rtMax ) subquery ON subquery.spectrum_id = MSMS_precursor.spectrum_spectrum_id WHERE MSMS_precursor.ion_mz BETWEEN subquery.mzMin AND subquery.mzMax
Using a subquery enables you to reduce the number of comparisons between the tables - you can quickly filter out the unwanted features, then the un-related spectra before searching for suitable precursors.
I don't use SQLLite - but the principle should still apply.
UPDATED : fixed bug in SQL
Notes:
You don't have to worry about the ANDs, you'll only get:
- features where feature.msrun_msrun_id = 'value'
- spectra for those features and where spectrum.scan_start_time BETWEENsubquery.rtMin AND subquery.rtMax
- precursors for those spectrs and where MSMS_precursor.ion_mz BETWEEN subquery.mzMin ANDsubquery.mzMax
UPDATE 18/May:
It's the indexing!!! you have indexes on the search fields, but not on the fields participating in the joins - foreign key indices really boost performance:
CREATE INDEX `fk_msrun_msrun_id_feature` ON `feature` (`msrun_msrun_id` ASC); CREATE INDEX `fk_spectrum_spectrum_id_feature` ON `feature` (`msrun_msrun_id` ASC); CREATE INDEX `fk_spectrum_spectrum_id_MSMS_precursor` ON `MSMS_precursor` (`spectrum_spectrum_id` ASC);
I suggest you try using an R*Tree index, They're designed for efficient range queries.
I haven't actually used R*Tree much, just read the documentation, but I think you might be using it incorrectly. You may want to try changing your query to use
WHERE convexhull_edges.rtMin <= spectrum.scan_start_time AND convexhull_edges.rtMax >= spectrum.scan_start_time ANDconvexhull_edges.mzMin <= MSMS_precursor.ion_mz AND convexhull_edges.mzMax >= MSMS_precursor.ion_mz
which should be equivalent to your current query, but I think should be faster (You should be picking a range out of the R*Tree, rather than comparing the point to the range)
Consider using covering indices on the tables involved in your query.
You are indeed fetching a limited amount of columns in your select
statement and the corresponding inner join
and where
clauses. By using a covering index with the columns well ordered in it, you should get a very fast query, that is you will remove the scan table
in favor of an search table
using a covering index
.
Try to use those indices on your tables:
CREATE INDEX `fk_covering_feature` ON `feature` (`msrun_msrun_id`,`mzMin`,`mzMax`,`rtMin`,`rtMax`,`feature_table_id`);CREATE INDEX `fk_covering_spectrum` ON `spectrum` (`msrun_msrun_id`,`scan_start_time`,`spectrum_id`);CREATE INDEX `fk_covering_MSMS_precursor` ON `MSMS_precursor` (`spectrum_spectrum_id`,`ion_mz`,`precursor_id`);
When going for speed, you should also hint the query planner to understand msrun_msrun_id is a constant to check for both feature
and spectrum
tables. Add the constant test in your query by putting this additional test at the end of the query (and pass spectrumFeature_InputValues
twice):
"AND spectrum.msrun_msrun_id = ?"