SQLite insert speed slows as number of records increases due to an index SQLite insert speed slows as number of records increases due to an index sqlite sqlite

SQLite insert speed slows as number of records increases due to an index


If your requirement is to find a particular z_id and the x_ids and y_ids linked to it (as distinct from quickly selecting a range of z_ids) you could look into a non-indexed hash-table nested-relational db that would allow you to instantly find your way to a particular z_id in order to get its y_ids and x_ids -- without the indexing overhead and the concomitant degraded performance during inserts as the index grows. In order to avoid clumping (aka bucket collisions), choose a key hashing algorithm that puts greatest weight on the digits of z_id with greatest variation (right-weighted).

P.S. A database that uses a b-tree may at first appear faster than a db that uses linear hashing, say, but the insert performance will remain level with the linear hash as performance on the b-tree begins to degrade.

P.P.S. To answer @kawing-chiu's question: the core feature relevant here is that such a database relies on so-called "sparse" tables in which the physical location of a record is determined by a hashing algorithm which takes the record key as input. This approach permits a seek directly to the record's location in the table without the intermediary of an index. As there is no need to traverse indexes or to re-balance indexes, insert-times remain constant as the table becomes more densely populated. With a b-tree, by contrast, insert times degrade as the index tree grows. OLTP applications with large numbers of concurrent inserts can benefit from such a sparse-table approach. The records are scattered throughout the table. The downside of records being scattered across the "tundra" of the sparse table is that gathering large sets of records which have a value in common, such as a postal code, can be slower. The hashed sparse-table approach is optimized to insert and retrieve individual records, and to retrieve networks of related records, not large sets of records that have some field value in common.

A nested relational database is one that permits tuples within a column of a row.


Great question and very interesting follow-up!

I would just like to make a quick remark: You mentioned that breaking the table into smaller subtables / files and attaching them later is not an option because you'll quickly reach the hard limit of 62 attached databases. While this is completely true, I don't think you have considered a mid-way option: sharding the data into several tables but keep using the same, single database (file).


I did a very crude benchmark just to make sure my suggestion really has an impact on performance.

Schema:

CREATE TABLE IF NOT EXISTS "test_$i"(    "i" integer NOT NULL,    "md5" text(32) NOT NULL);

Data - 2 Million Rows:

  • i = 1..2,000,000
  • md5 = md5 hex digest of i

Each transaction = 50,000 INSERTs.


Databases: 1; Tables: 1; Indexes: 0

0..50000 records inserted in 1.87 seconds50000..100000 records inserted in 1.92 seconds100000..150000 records inserted in 1.97 seconds150000..200000 records inserted in 1.99 seconds200000..250000 records inserted in 2.19 seconds250000..300000 records inserted in 1.94 seconds300000..350000 records inserted in 1.94 seconds350000..400000 records inserted in 1.94 seconds400000..450000 records inserted in 1.94 seconds450000..500000 records inserted in 2.50 seconds500000..550000 records inserted in 1.94 seconds550000..600000 records inserted in 1.94 seconds600000..650000 records inserted in 1.93 seconds650000..700000 records inserted in 1.94 seconds700000..750000 records inserted in 1.94 seconds750000..800000 records inserted in 1.94 seconds800000..850000 records inserted in 1.93 seconds850000..900000 records inserted in 1.95 seconds900000..950000 records inserted in 1.94 seconds950000..1000000 records inserted in 1.94 seconds1000000..1050000 records inserted in 1.95 seconds1050000..1100000 records inserted in 1.95 seconds1100000..1150000 records inserted in 1.95 seconds1150000..1200000 records inserted in 1.95 seconds1200000..1250000 records inserted in 1.96 seconds1250000..1300000 records inserted in 1.98 seconds1300000..1350000 records inserted in 1.95 seconds1350000..1400000 records inserted in 1.95 seconds1400000..1450000 records inserted in 1.95 seconds1450000..1500000 records inserted in 1.95 seconds1500000..1550000 records inserted in 1.95 seconds1550000..1600000 records inserted in 1.95 seconds1600000..1650000 records inserted in 1.95 seconds1650000..1700000 records inserted in 1.96 seconds1700000..1750000 records inserted in 1.95 seconds1750000..1800000 records inserted in 1.95 seconds1800000..1850000 records inserted in 1.94 seconds1850000..1900000 records inserted in 1.95 seconds1900000..1950000 records inserted in 1.95 seconds1950000..2000000 records inserted in 1.95 seconds

Database file size: 89.2 MiB.


Databases: 1; Tables: 1; Indexes: 1 (md5)

0..50000 records inserted in 2.90 seconds50000..100000 records inserted in 11.64 seconds100000..150000 records inserted in 10.85 seconds150000..200000 records inserted in 10.62 seconds200000..250000 records inserted in 11.28 seconds250000..300000 records inserted in 12.09 seconds300000..350000 records inserted in 10.60 seconds350000..400000 records inserted in 12.25 seconds400000..450000 records inserted in 13.83 seconds450000..500000 records inserted in 14.48 seconds500000..550000 records inserted in 11.08 seconds550000..600000 records inserted in 10.72 seconds600000..650000 records inserted in 14.99 seconds650000..700000 records inserted in 10.85 seconds700000..750000 records inserted in 11.25 seconds750000..800000 records inserted in 17.68 seconds800000..850000 records inserted in 14.44 seconds850000..900000 records inserted in 19.46 seconds900000..950000 records inserted in 16.41 seconds950000..1000000 records inserted in 22.41 seconds1000000..1050000 records inserted in 24.68 seconds1050000..1100000 records inserted in 28.12 seconds1100000..1150000 records inserted in 26.85 seconds1150000..1200000 records inserted in 28.57 seconds1200000..1250000 records inserted in 29.17 seconds1250000..1300000 records inserted in 36.99 seconds1300000..1350000 records inserted in 30.66 seconds1350000..1400000 records inserted in 32.06 seconds1400000..1450000 records inserted in 33.14 seconds1450000..1500000 records inserted in 47.74 seconds1500000..1550000 records inserted in 34.51 seconds1550000..1600000 records inserted in 39.16 seconds1600000..1650000 records inserted in 37.69 seconds1650000..1700000 records inserted in 37.82 seconds1700000..1750000 records inserted in 41.43 seconds1750000..1800000 records inserted in 49.58 seconds1800000..1850000 records inserted in 44.08 seconds1850000..1900000 records inserted in 57.17 seconds1900000..1950000 records inserted in 50.04 seconds1950000..2000000 records inserted in 42.15 seconds

Database file size: 181.1 MiB.


Databases: 1; Tables: 20 (one per 100,000 records); Indexes: 1 (md5)

0..50000 records inserted in 2.91 seconds50000..100000 records inserted in 10.30 seconds100000..150000 records inserted in 10.85 seconds150000..200000 records inserted in 10.45 seconds200000..250000 records inserted in 10.11 seconds250000..300000 records inserted in 11.04 seconds300000..350000 records inserted in 10.25 seconds350000..400000 records inserted in 10.36 seconds400000..450000 records inserted in 11.48 seconds450000..500000 records inserted in 10.97 seconds500000..550000 records inserted in 10.86 seconds550000..600000 records inserted in 10.35 seconds600000..650000 records inserted in 10.77 seconds650000..700000 records inserted in 10.62 seconds700000..750000 records inserted in 10.57 seconds750000..800000 records inserted in 11.13 seconds800000..850000 records inserted in 10.44 seconds850000..900000 records inserted in 10.40 seconds900000..950000 records inserted in 10.70 seconds950000..1000000 records inserted in 10.53 seconds1000000..1050000 records inserted in 10.98 seconds1050000..1100000 records inserted in 11.56 seconds1100000..1150000 records inserted in 10.66 seconds1150000..1200000 records inserted in 10.38 seconds1200000..1250000 records inserted in 10.24 seconds1250000..1300000 records inserted in 10.80 seconds1300000..1350000 records inserted in 10.85 seconds1350000..1400000 records inserted in 10.46 seconds1400000..1450000 records inserted in 10.25 seconds1450000..1500000 records inserted in 10.98 seconds1500000..1550000 records inserted in 10.15 seconds1550000..1600000 records inserted in 11.81 seconds1600000..1650000 records inserted in 10.80 seconds1650000..1700000 records inserted in 11.06 seconds1700000..1750000 records inserted in 10.24 seconds1750000..1800000 records inserted in 10.57 seconds1800000..1850000 records inserted in 11.54 seconds1850000..1900000 records inserted in 10.80 seconds1900000..1950000 records inserted in 11.07 seconds1950000..2000000 records inserted in 13.27 seconds

Database file size: 180.1 MiB.


As you can see, the insert speed remains pretty much constant if you shard the data into several tables.


Unfortunately I'd say this is a limitation of large tables in SQLite. It's not designed to operate on large-scale or large-volume datasets. While I understand it may drastically increase project complexity, you're probably better off researching more sophisticated database solutions appropriate to your needs.

From everything you linked, it looks like table size to access speed is a direct tradeoff. Can't have both.