Sqlite column with repetitive values
The question is very similar to Add data to many-to-many relation with one SQL command, but it also discusses further aspect - automatic clean-up of unused entities.
Is there a way to let SQLite do everything automatically? ... (without having to maintain anything ourselves)
No. You basically want to insert rows in the base table and referenced table if it doesn't exist yet, while specifying the reference by value rather than its surrogate key. That in fact isn't a straightforward task in other RDBMSes as well. Some of them support:
- Stored procedures
- Multitable inserts
- OUTPUT clause
- RETURNING clause
- Writable (updatable) views
INSTEAD OF
triggers on views
From the above list SQLite only supports INSTEAD OF
triggers. Here's how it applies to your use case (I have adopted table words
from your db<>fiddle mentioned in comments under the question and renamed its column a
to value
):
PRAGMA foreign_keys = ON;CREATE TABLE words( id INTEGER PRIMARY KEY, value TEXT);CREATE UNIQUE INDEX unique_words_value ON words(value);CREATE TABLE data( id INTEGER PRIMARY KEY, word_id INTEGER NOT NULL, FOREIGN KEY (word_id) REFERENCES words(id));CREATE VIEW data_view ASSELECT d.id, w.value FROM data AS d INNER JOIN words AS w on w.id = d.word_id;CREATE TRIGGER data_view_insert INSTEAD OF INSERT ON data_viewBEGIN INSERT OR IGNORE INTO words(value) VALUES (NEW.value); INSERT OR IGNORE INTO data(word_id) VALUES( (SELECT id FROM words WHERE value = NEW.value) );END;INSERT INTO data_view (value) VALUES ('random1'), ('random2'), ('random3'), ('random1'), ('random3'), ('random4');
The INSERT
statement produced this content of table words
:
id | value |
---|---|
1 | random1 |
2 | random2 |
3 | random3 |
4 | random4 |
and data_view
:
id | value |
---|---|
1 | random1 |
2 | random2 |
3 | random3 |
4 | random1 |
5 | random3 |
6 | random4 |
To support deleting row from data
with automatic clean-up of unused values in words
you can add INSTEAD OF DELETE
trigger on data_view
:
CREATE TRIGGER data_view_delete INSTEAD OF DELETE ON data_viewBEGIN DELETE FROM data WHERE id = OLD.id; DELETE FROM words WHERE value = OLD.value AND NOT EXISTS(SELECT 1 FROM data_view WHERE value = OLD.value);END;
Its first statement deletes a row in table data
and the second deletes the referenced value from words
, if it isn't referenced by other rows in table data
. Similarly to INSERT
, you delete rows from table data
indirectly via data_view
:
-- delete one 'random2' and 'random4' value dataDELETE FROM data_view WHERE id IN (2, 4);
That results in words
:
id | value |
---|---|
1 | random1 |
3 | random3 |
4 | random4 |
As you can see the 'random2'
value was deleted, because it was referenced only once in data
, and 'random4'
value was kept, because there was another reference in table data
(data_view
):
id | value |
---|---|
1 | random1 |
3 | random3 |
5 | random3 |
6 | random4 |
Here's db<>fiddle to play with.
Full credit to @PeterWolf for his excellent answer, here is a slightly modified version and ready-to-run code:
import sqlite3, randomVALUES = ["hello world", "it's a shame to store this str many times", "bye bye", "abc"]db = sqlite3.connect('repetitive3.db')db.executescript("""CREATE TABLE words(id INTEGER PRIMARY KEY, value TEXT UNIQUE);CREATE TABLE data(id INTEGER PRIMARY KEY, word_id INTEGER NOT NULL);CREATE VIEW data_view AS SELECT d.id, w.value FROM data AS d INNER JOIN words AS w on w.id = d.word_id;CREATE TRIGGER data_view_insert INSTEAD OF INSERT ON data_viewBEGIN INSERT OR IGNORE INTO words(value) VALUES(NEW.value); INSERT OR IGNORE INTO data(word_id) VALUES((SELECT id FROM words WHERE value = NEW.value));END;""")for i in range(1000*1000): db.execute("INSERT INTO data_view (value) VALUES (?)", (random.choice(VALUES),))print(list(db.execute("SELECT * FROM words")))print(list(db.execute("SELECT * FROM data WHERE id BETWEEN 100 AND 105")))print(list(db.execute("SELECT * FROM data_view WHERE id BETWEEN 100 AND 105")))
Minor modifications (the result is the same: 9 MB only database size)
there is here no index on table
words
, but only theUNIQUE
keyword forvalue
, that ensures thatINSERT OR IGNORE INTO
falls intoIGNORE
if the value already existsno FOREIGN KEY used
Same idea, but without a view:
db.execute('CREATE TABLE words(value TEXT UNIQUE);')db.execute('CREATE TABLE data(id INTEGER PRIMARY KEY, word_id INTEGER NOT NULL);')for i in range(1000*1000): v = random.choice(VALUES) db.execute("INSERT OR IGNORE INTO words(value) VALUES(?);", (v,)) db.execute("INSERT INTO data(word_id) VALUES ((SELECT rowid FROM words WHERE value = ?));", (v,))searched_word = 'hello world'print(list(db.execute("SELECT id, word_id FROM data WHERE id BETWEEN 100 AND 120 AND word_id = (SELECT rowid FROM words WHERE value = ?)", (searched_word,))))