Excel VBA Performance - 1 million rows - Delete rows containing a value, in less than 1 min
I'm providing the first answer as a reference
Others may find it useful, if there are no other options available
- Fastest way to achieve the result is not to use the Delete operation
- Out of 1 million records it removes 100,000 rows in an average of 33 seconds
.
Sub DeleteRowsWithValuesNewSheet() '100K records 10K to delete 'Test 1: 2.40234375 sec 'Test 2: 2.41796875 sec 'Test 3: 2.40234375 sec '1M records 100K to delete 'Test 1: 32.9140625 sec 'Test 2: 33.1484375 sec 'Test 3: 32.90625 sec Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long Dim wsName As String, t As Double, oldUsedRng As Range FastWB True: t = Timer Set oldWs = Worksheets(1) wsName = oldWs.Name Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange)) If oldUsedRng.Rows.Count > 1 Then 'If sheet is not empty Set newWs = Sheets.Add(After:=oldWs) 'Add new sheet With oldUsedRng .AutoFilter Field:=1, Criteria1:="<>Test String" .Copy 'Copy visible data End With With newWs.Cells .PasteSpecial xlPasteColumnWidths .PasteSpecial xlPasteAll 'Paste data on new sheet .Cells(1, 1).Select 'Deselect paste area .Cells(1, 1).Copy 'Clear Clipboard End With oldWs.Delete 'Delete old sheet newWs.Name = wsName End If FastWB False: InputBox "Duration: ", "Duration", Timer - tEnd Sub
.
At high level:
- It creates a new worksheet, and keeps a reference to the initial sheet
- AutoFilters column 1 on the searched text:
.AutoFilter Field:=1, Criteria1:="<>Test String"
- Copies all (visible) data from initial sheet
- Pastes column widths, formats, and data to the new sheet
- Deletes initial sheet
- Renames the new sheet to the old sheet name
It uses the same helper functions posted in the question
The 99% of the duration is used by the AutoFilter
.
There are a couple limitations I found so far, the first can be addressed:
If there are any hidden rows on the initial sheet, it unhides them
- A separate function is needed to hide them back
- Depending on implementation, it might significantly increase duration
VBA related:
- It changes the Code Name of the sheet; other VBA referring to Sheet1 will be broken (if any)
- It deletes all VBA code associated with the initial sheet (if any)
.
A few notes about using large files like this:
- The binary format (.xlsb) reduce file size dramatically (from 137 Mb to 43 Mb)
Unmanaged Conditional Formatting rules can cause exponential performance issues
- The same for Comments, and Data validation
Reading file or data from network is much slower than working with a locall file
A significant gain in speed can be achieved if the source data do not contain formulas, or if the scenario would allow (or want) the formulas to be converted into hard values during the conditional row deletions.
With the above as a caveat, my solution uses the AdvancedFilter of the range object. It's about twice as fast as DeleteRowsWithValuesNewSheet().
Public Sub ExcelHero() Dim t#, crit As Range, data As Range, ws As Worksheet Dim r&, fc As Range, lc As Range, fr1 As Range, fr2 As Range FastWB True t = Timer Set fc = ActiveSheet.UsedRange.Item(1) Set lc = GetMaxCell Set data = ActiveSheet.Range(fc, lc) Set ws = Sheets.Add With data Set fr1 = data.Worksheet.Range(fc, fc.Offset(, lc.Column)) Set fr2 = ws.Range(ws.Cells(fc.Row, fc.Column), ws.Cells(fc.Row, lc.Column)) With fr2 fr1.Copy .PasteSpecial xlPasteColumnWidths: .PasteSpecial xlPasteAll .Item(1).Select End With Set crit = .Resize(2, 1).Offset(, lc.Column + 1) crit = [{"Column 1";"<>Test String"}] .AdvancedFilter xlFilterCopy, crit, fr2 .Worksheet.Delete End With FastWB False r = ws.UsedRange.Rows.Count Debug.Print "Rows: " & r & ", Duration: " & Timer - t & " seconds"End Sub
On my elderly Dell Inspiron 1564 (Win 7 Office 2007) this:
Sub QuickAndEasy() Dim rng As Range Set rng = Range("AA2:AA1000001") Range("AB1") = Now Application.ScreenUpdating = False With rng .Formula = "=If(A2=""Test String"",0/0,A2)" .Cells.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete .Clear End With Application.ScreenUpdating = True Range("AC1") = NowEnd Sub
took about 10 seconds to run. I am assuming that column AA is available.
EDIT#1:
Please note that this code does not set Calculation to Manual. Performance will improve if the Calculation mode is set to Manual after the "helper" column is allowed to calculate.