Sqlite column with repetitive values Sqlite column with repetitive values sqlite sqlite

Sqlite column with repetitive values


The question is very similar to Add data to many-to-many relation with one SQL command, but it also discusses further aspect - automatic clean-up of unused entities.

Is there a way to let SQLite do everything automatically? ... (without having to maintain anything ourselves)

No. You basically want to insert rows in the base table and referenced table if it doesn't exist yet, while specifying the reference by value rather than its surrogate key. That in fact isn't a straightforward task in other RDBMSes as well. Some of them support:

From the above list SQLite only supports INSTEAD OF triggers. Here's how it applies to your use case (I have adopted table words from your db<>fiddle mentioned in comments under the question and renamed its column a to value):

PRAGMA foreign_keys = ON;CREATE TABLE words(  id INTEGER PRIMARY KEY,  value TEXT);CREATE UNIQUE INDEX unique_words_value ON words(value);CREATE TABLE data(  id INTEGER PRIMARY KEY,  word_id INTEGER NOT NULL,  FOREIGN KEY (word_id) REFERENCES words(id));CREATE VIEW data_view ASSELECT d.id, w.value FROM data AS d INNER JOIN words AS w on w.id = d.word_id;CREATE TRIGGER data_view_insert INSTEAD OF INSERT ON data_viewBEGIN  INSERT OR IGNORE INTO words(value) VALUES (NEW.value);  INSERT OR IGNORE INTO data(word_id) VALUES(    (SELECT id FROM words WHERE value = NEW.value)  );END;INSERT INTO data_view (value) VALUES  ('random1'),  ('random2'),  ('random3'),  ('random1'),  ('random3'),  ('random4');

The INSERT statement produced this content of table words:

idvalue
1random1
2random2
3random3
4random4

and data_view:

idvalue
1random1
2random2
3random3
4random1
5random3
6random4

To support deleting row from data with automatic clean-up of unused values in words you can add INSTEAD OF DELETE trigger on data_view:

CREATE TRIGGER data_view_delete INSTEAD OF DELETE ON data_viewBEGIN  DELETE FROM data    WHERE id = OLD.id;  DELETE FROM words    WHERE value = OLD.value AND NOT EXISTS(SELECT 1 FROM data_view WHERE value = OLD.value);END;

Its first statement deletes a row in table data and the second deletes the referenced value from words, if it isn't referenced by other rows in table data. Similarly to INSERT, you delete rows from table data indirectly via data_view:

-- delete one 'random2' and 'random4' value dataDELETE FROM data_view WHERE id IN (2, 4);

That results in words:

idvalue
1random1
3random3
4random4

As you can see the 'random2' value was deleted, because it was referenced only once in data, and 'random4' value was kept, because there was another reference in table data (data_view):

idvalue
1random1
3random3
5random3
6random4

Here's db<>fiddle to play with.


Full credit to @PeterWolf for his excellent answer, here is a slightly modified version and ready-to-run code:

import sqlite3, randomVALUES = ["hello world", "it's a shame to store this str many times", "bye bye", "abc"]db = sqlite3.connect('repetitive3.db')db.executescript("""CREATE TABLE words(id INTEGER PRIMARY KEY, value TEXT UNIQUE);CREATE TABLE data(id INTEGER PRIMARY KEY, word_id INTEGER NOT NULL);CREATE VIEW data_view AS SELECT d.id, w.value FROM data AS d INNER JOIN words AS w on w.id = d.word_id;CREATE TRIGGER data_view_insert INSTEAD OF INSERT ON data_viewBEGIN  INSERT OR IGNORE INTO words(value) VALUES(NEW.value);  INSERT OR IGNORE INTO data(word_id) VALUES((SELECT id FROM words WHERE value = NEW.value));END;""")for i in range(1000*1000):    db.execute("INSERT INTO data_view (value) VALUES (?)", (random.choice(VALUES),))print(list(db.execute("SELECT * FROM words")))print(list(db.execute("SELECT * FROM data WHERE id BETWEEN 100 AND 105")))print(list(db.execute("SELECT * FROM data_view WHERE id BETWEEN 100 AND 105")))

Minor modifications (the result is the same: 9 MB only database size)

  • there is here no index on table words, but only the UNIQUE keyword for value, that ensures that INSERT OR IGNORE INTO falls into IGNORE if the value already exists

  • no FOREIGN KEY used


Same idea, but without a view:

db.execute('CREATE TABLE words(value TEXT UNIQUE);')db.execute('CREATE TABLE data(id INTEGER PRIMARY KEY, word_id INTEGER NOT NULL);')for i in range(1000*1000):    v = random.choice(VALUES)    db.execute("INSERT OR IGNORE INTO words(value) VALUES(?);", (v,))    db.execute("INSERT INTO data(word_id) VALUES ((SELECT rowid FROM words WHERE value = ?));", (v,))searched_word = 'hello world'print(list(db.execute("SELECT id, word_id FROM data WHERE id BETWEEN 100 AND 120 AND word_id = (SELECT rowid FROM words WHERE value = ?)", (searched_word,))))