Postgresql - Clean way to insert records if they don't exist, update if they do Postgresql - Clean way to insert records if they don't exist, update if they do database database

Postgresql - Clean way to insert records if they don't exist, update if they do


Ok, I managed to create a testcase. The result is that the update part is always executed, even on a fresh insert. COPY seems to bypass the rule system.[For clarity I have put this into a separate reply]

DROP TABLE pages CASCADE;CREATE TABLE pages    ( url VARCHAR NOT NULL  PRIMARY KEY    , html VARCHAR    , last TIMESTAMP    );INSERT INTO pages(url,html,last) VALUES ('www.example.com://page1' , 'meuk1' , '2001-09-18 23:30:00'::timestamp );CREATE RULE Pages_Upsert AS ON INSERT TO pages  WHERE EXISTS (SELECT 1 from pages P where NEW.url = P.url)     DO INSTEAD (     UPDATE pages SET html=new.html , last = NOW() WHERE url = NEW.url    );INSERT INTO pages(url,html,last) VALUES ('www.example.com://page2' , 'meuk2' , '2002-09-18 23:30:00':: timestamp );INSERT INTO pages(url,html,last) VALUES ('www.example.com://page3' , 'meuk3' , '2003-09-18 23:30:00':: timestamp );INSERT INTO pages(url,html,last) SELECT pp.url || '/added'::text, pp.html || '.html'::text , pp.last + interval '20 years' FROM pages pp;COPY pages(url,html,last) FROM STDIN;www.example.com://pageX     stdin   2000-09-18 23:30:00\.SELECT * FROM pages;

The result:

              url              |    html    |            last            -------------------------------+------------+---------------------------- www.example.com://page1       | meuk1      | 2001-09-18 23:30:00 www.example.com://page2       | meuk2      | 2011-09-18 23:48:30.775373 www.example.com://page3       | meuk3      | 2011-09-18 23:48:30.783758 www.example.com://page1/added | meuk1.html | 2011-09-18 23:48:30.792097 www.example.com://page2/added | meuk2.html | 2011-09-18 23:48:30.792097 www.example.com://page3/added | meuk3.html | 2011-09-18 23:48:30.792097 www.example.com://pageX       | stdin      | 2000-09-18 23:30:00 (7 rows)

UPDATE: Just to prove it can be done:

INSERT INTO pages(url,html,last) VALUES ('www.example.com://page1' , 'meuk1' , '2001-09-18 23:30:00'::timestamp );CREATE VIEW vpages AS (SELECT * from pages);CREATE RULE Pages_Upsert AS ON INSERT TO vpages  DO INSTEAD (     UPDATE pages p0     SET html=NEW.html , last = NOW() WHERE p0.url = NEW.url    ;     INSERT INTO pages (url,html,last)    SELECT NEW.url, NEW.html, NEW.last        WHERE NOT EXISTS ( SELECT * FROM pages p1 WHERE p1.url = NEW.url)    );CREATE RULE Pages_Indate AS ON UPDATE TO vpages  DO INSTEAD (     INSERT INTO pages (url,html,last)    SELECT NEW.url, NEW.html, NEW.last        WHERE NOT EXISTS ( SELECT * FROM pages p1 WHERE p1.url = OLD.url)        ;     UPDATE pages p0     SET html=NEW.html , last = NEW.last WHERE p0.url = NEW.url        ;    );INSERT INTO vpages(url,html,last) VALUES ('www.example.com://page2' , 'meuk2' , '2002-09-18 23:30:00':: timestamp );INSERT INTO vpages(url,html,last) VALUES ('www.example.com://page3' , 'meuk3' , '2003-09-18 23:30:00':: timestamp );INSERT INTO vpages(url,html,last) SELECT pp.url || '/added'::text, pp.html || '.html'::text , pp.last + interval '20 years' FROM vpages pp;UPDATE vpages SET last = last + interval '-10 years' WHERE url = 'www.example.com://page1' ;-- Copy does NOT work on views-- COPY vpages(url,html,last) FROM STDIN;-- www.example.com://pageX    stdin    2000-09-18 23:30:00-- \.SELECT * FROM vpages;

Result:

INSERT 0 1INSERT 0 1INSERT 0 3UPDATE 1              url              |    html    |        last         -------------------------------+------------+--------------------- www.example.com://page2       | meuk2      | 2002-09-18 23:30:00 www.example.com://page3       | meuk3      | 2003-09-18 23:30:00 www.example.com://page1/added | meuk1.html | 2021-09-18 23:30:00 www.example.com://page2/added | meuk2.html | 2022-09-18 23:30:00 www.example.com://page3/added | meuk3.html | 2023-09-18 23:30:00 www.example.com://page1       | meuk1      | 1991-09-18 23:30:00(6 rows)

The view is necessary to prevent the rewrite system to go into recursion.Construction of a DELETE rule is left as an exercise to the reader.


Some good points from someone who should know it or be very near to someone like that ;-)

What are PostgreSQL RULEs good for?

Short story:

  • Do the rules work well with SERIAL and BIGSERIAL ?
  • Do the rules work well with the RETURNING clauses of INSERT and UPDATE ?
  • Do the rules work well with stuff like random()?

All these things boils down to the fact, that the rule system is not row driven but transforms your statements in a way you never imagine.

Do yourself and your team mates a favour and stop using roles for things like that.

Edit: Your problem is well discussed in the PostgreSQL community. Search keywords are: MERGE, UPSERT.


I don't know if this gets too subjective but what I think about your solution is: It's all about semantics. When I do an insert, I expect an insert and not some fancy logic that maybe does an insert but maybe not. Indeed that's what functions are for.

At first I'd try checking for the URL in your program and then choosing whether to insert or update. If that turned out to be too slow, I'd use a function. If you name it like insert_or_update_url, you automatically get some documentation for free. The rewrite rule requires you to have some implicit knowledge and I generally try to avoid that.

On the plus side: If someone copies the data but forgets rules and functions, your solution might break silently (but that may depend on other constraints), but a missing function goes down screaming. Don't get me wrong, I think your solution is very creative and smart. Just a bit too obscure for my taste.