Copy one column to another for over a billion rows in SQL Server database

sql sql-server sql-server-2005 tsql large-data-volumes

I'm going to guess that you are closing in on the 2.1billion limit of an INT datatype on an artificial key for a column. Yes, that's a pain. Much easier to fix before the fact than after you've actually hit that limit and production is shut down while you are trying to fix it :)

Anyway, several of the ideas here will work. Let's talk about speed, efficiency, indexes, and log size, though.

Log Growth

The log blew up originally because it was trying to commit all 2b rows at once. The suggestions in other posts for "chunking it up" will work, but that may not totally resolve the log issue.

If the database is in SIMPLE mode, you'll be fine (the log will re-use itself after each batch). If the database is in FULL or BULK_LOGGED recovery mode, you'll have to run log backups frequently during the running of your operation so that SQL can re-use the log space. This might mean increasing the frequency of the backups during this time, or just monitoring the log usage while running.

Indexes and Speed

ALL of the where bigid is null answers will slow down as the table is populated, because there is (presumably) no index on the new BIGID field. You could, (of course) just add an index on BIGID, but I'm not convinced that is the right answer.

The key (pun intended) is my assumption that the original ID field is probably the primary key, or the clustered index, or both. In that case, lets take advantage of that fact, and do a variation of Jess' idea:

set @counter = 1while @counter < 2000000000 --or whateverbegin  update test_table set bigid = id   where id between @counter and (@counter + 499999) --BETWEEN is inclusive  set @counter = @counter + 500000end

This should be extremely fast, because of the existing indexes on ID.

The ISNULL check really wasn't necessary anyway, neither is my (-1) on the interval. If we duplicate some rows between calls, that's not a big deal.

sql sql-server sql-server-2005 tsql large-data-volumes

Use TOP in the UPDATE statement:

UPDATE TOP (@row_limit) dbo.test_table   SET bigid = id  WHERE bigid IS NULL

sql sql-server sql-server-2005 tsql large-data-volumes

You could try to use something like SET ROWCOUNT and do batch updates:

SET ROWCOUNT 5000;UPDATE dbo.test_table SET bigid = id WHERE bigid IS NULLGO

and then repeat this as many times as you need to.

This way, you're avoiding the RBAR (row-by-agonizing-row) symptoms of cursors and while loops, and yet, you don't unnecessarily fill up your transaction log.

Of course, in between runs, you'd have to do backups (especially of your log) to keep its size within reasonable limits.

CodeHunter

Copy one column to another for over a billion rows in SQL Server database

Log Growth

Indexes and Speed

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last