What is the right order of insertion/deletion/modification on dataset? What is the right order of insertion/deletion/modification on dataset? sql sql

What is the right order of insertion/deletion/modification on dataset?


Doesn't your SQL product support deferred constraint checking ?

If not, you could try

Delete all child records - delete all parent records - insert all parent records - insert all child records

where any UPDATEs have been split into their constituent DELETEs and INSERTs.

This should work correctly in all cases, but at acceptable speeds probably in none ...

It is also provable that this is the only scheme that can work correctly in all cases, since :

(a) key constraints on parent dictate that parent DELETES must precede parent INSERTS,
(b) key constraints on child dictate that child DELETES must precede child INSERTS,
(c) FK dictates that child DELETES must precede parent DELETES
(d) FK also dictates that child INSERTS must follow parent INSERTS

The given sequence is the only possible one that satisfies these 4 requirements, and it also shows that UPDATEs to the child make a solution impossible no matter what, since an UPDATE means a "simultaneous" DELETE plus INSERT.


You have to take their context into account. MS said

When updating related tables in a dataset, it is important to update in the proper sequence to reduce the chance of violating referential integrity constraints.

in the context of writing client data application software.

Why is it important to reduce the chance of violating referential integrity constraints? Because violating those constraints means

  • more round trips between the dbms and the client, either for the client code to handle the constraint violations, or for the human user to handle the violations,
  • more time taken,
  • more load on the server,
  • more opportunities for human error, and
  • more chances for concurrent updates to change the underlying data (possibly confusing either the application code, the human user, or both).

And why do they consider their procedure the right way? Because it provides a single process that will avoid referential integrity violations in almost all the common cases, and even in a lot of the uncommon ones. For example . . .

  • If the update is a DELETE operation on the referenced table, and if foreign keys in the referencing tables are declared as ON DELETE CASCADE, then the optimal thing is to simply delete the referenced row (the parent row), and let the dbms manage the cascade. (This is also the optimal thing for ON DELETE SET DEFAULT, and for ON DELETE SET NULL.)

  • If the update is a DELETE operation on the referenced table, and if foreign keys in the referencing tables are declared as ON DELETE RESTRICT, then the optimal thing is to delete all the referencing rows (child rows) first, then delete the referenced row.

But, with proper use of transactions, MS's procedure leaves the database in a consistent state regardless. The value is that it's a single, client-side process to code and to maintain, even though it's not optimal in all cases. (That's often the case in software design--choosing a single way that's not optimal in all cases. ActiveRecord leaps to mind.)

You said

Example : ParentTable have two records parent1(Id : 1) and parent2(Id : 2)

ChildTable have a record child1(Id : 1, ParentId : 1)

If we update the child1 to have a new parent parent2, and the we delete parent1.

  1. We have nothing to delete in child table
  2. We delete parent1 : we broke the constraint, because the child is still attached to parent1, unless we update it first.

That's not a referential integrity issue; it's a procedural issue. This problem clearly requires two transactions.

  1. Update the child to have a new parent, then commit. This data must be corrected regardless of what happens to the first parent. Specifically, this data must be corrected even if there are concurrent updates or other constraints that make it either temporarily or permanently impossible to delete the first parent. (This isn't a referential integrity issue, because there's no ON DELETE SET TO NEXT PARENT ID OR MAKE YOUR BEST GUESS clause in SQL foreign key constraints.)

  2. Delete the first parent, then commit. This might require first updating any number of child rows in any number of tables. In a huge organization, I can imagine some deletes like this taking weeks to finish.


Sounds to me like:

  1. Insert parent2. Child still points to parent1.
  2. Update child to point to parent2. Now nothing references parent1.
  3. Delete parent1.

You'd want to wrap it in a transaction where available.

Depending on your schema, you could also extend this to:

  1. Update parent1 to indicate that it is locked (or lock it in the DB), thus preventing updates.
  2. Insert parent2
  3. Update child to point to parent2
  4. Delete parent1

This order has the advantage that a join between the parent and child will return a consistent result throughout. When the child is updating the results of a join will "flip" to the new state.

EDIT:

Another option is to move the parent/child references into another table, e.g. "links";

CREATE TABLE links (    link_id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,    parent_id INT NOT NULL,    child_id INT NOT NULL);

You may well want foreign keys constraints on the parent and child columns, as of course some appropriate indices. This arrangement allows for very flexible relationships between the parent and child tables - possibly too flexible, but that depends on your application. Now you can do something like;

UPDATE links    SET parent_id = @new_parent_id    WHERE parent_id = @old_parent_id    AND child_id = @child_id;