What is the fastest way to turn every member of an array alphanumeric?
tl;dr - Regular expressions destroy VBA implementations. If this were a code challenge, @brettj or @Slai should win it.
There are a bunch of tricks to make your AlphaNumericOnly
faster.
First, you can get rid of the vast majority of the function calls by treating it as a byte array instead of a string. That removes all of the calls to Mid$
and Asc
. Although these are incredibly fast functions, they still add the overhead pushing onto and popping off of the call stack. That adds up over a couple hundred thousand iterations.
The second optimization is to not use Case x To y
syntax if you can avoid it. The reason has to do with how it compiles - it doesn't compile to a test like Case = Condition >= x And Condition <= y
, it actually creates a loop with an early exit condition like this:
Case = FalseFor i = x To y If Condition = i Then Case = True End IfNext
Again, not a huge performance hit, but it adds up. The third optimization is to order your tests in a way that makes them sort circuit on the most likely hits in your data set. I tailored my examples below for primarily letters, with most of them upper case. You may do better with different ordering. Put it all together and you get something that looks like this:
Public Function ByteAlphaNumeric(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) 'Load the array up. Dim bound As Long bound = UBound(chars) 'Size the outbound array. ReDim outVal(bound) Dim i As Long, pos As Long For i = 0 To bound Step 2 'Wide characters, only care about the ASCII range. Dim temp As Byte temp = chars(i) 'Pointer math isn't free. Cache it. Select Case True 'Order is important here. Case temp > 64 And temp < 91 outVal(pos) = temp pos = pos + 2 'Advance the output pointer. Case temp < 48 Case temp > 122 Case temp > 96 outVal(pos) = temp pos = pos + 2 Case temp < 58 outVal(pos) = temp pos = pos + 2 End Select Next 'This is likely the most expensive operation. ReDim Preserve outVal(pos) 'Trim the output array. ByteAlphaNumeric = outValEnd Function
How does it do? Pretty well:
Public Sub Benchmark() Dim starting As Single, i As Long, dummy As String, sample As Variant sample = GetRandomString starting = Timer For i = 1 To 1000000 dummy = AlphaNumericOnlyOP(sample) Next i Debug.Print "OP's AlphaNumericOnly: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = AlphaNumericOnlyThunderframe(sample) Next i Debug.Print "ThunderFrame's AlphaNumericOnly: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = AlphaNumeric(sample) Next i Debug.Print "CallumDA33's AlphaNumeric: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = ByteAlphaNumeric(sample) Next i Debug.Print "ByteAlphaNumeric: ", Timer - starting Dim cast As String cast = CStr(sample) starting = Timer For i = 1 To 1000000 dummy = ByteAlphaNumericString(cast) Next i Debug.Print "ByteAlphaNumericString: ", Timer - starting Set stripper = Nothing starting = Timer For i = 1 To 1000000 dummy = OptimizedRegex(sample) Next i Debug.Print "OptimizedRegex: ", Timer - startingEnd SubPrivate Function GetRandomString() As Variant Dim chars(30) As Byte, i As Long Randomize For i = 0 To 30 Step 2 chars(i) = Int(96 * Rnd + 32) Next i Dim temp As String temp = chars GetRandomString = CVar(temp)End Function
Results with a 15 character random String
:
OP`s AlphaNumericOnly: 6.565918 ThunderFrame`s AlphaNumericOnly: 3.617188 CallumDA33`s AlphaNumeric: 23.518070 ByteAlphaNumeric: 2.354980
Note, I omitted submissions that weren't trivial to convert to functions. You may notice 2 additional test - the ByteAlphaNumericString
is exactly the same as the ByteAlphaNumeric
function, but it takes a String
as input instead of a Variant
and gets rid of the cast. That's not trivial:
ByteAlphaNumericString: 2.226074
And finally, the elusive OptimizedRegex
function (basically @brettj's code in function form for comparison timing):
Private stripper As RegExp 'Module levelFunction OptimizedRegex(strSource As Variant) As String If stripper Is Nothing Then Set stripper = New RegExp With stripper .Global = True .Pattern = "[^0-9A-Za-z]" End With End If OptimizedRegex = stripper.Replace(strSource, vbNullString)End Function
OptimizedRegex: 1.094727
EDIT: Bonus implementation!
It occurred to me that a hash table lookup might be faster than a Select Case
structure, so I built one with using a Scripting.Dictionary
:
Private hash As Scripting.Dictionary 'Module levelFunction HashLookups(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) Dim bound As Long bound = UBound(chars) ReDim outVal(bound) Dim i As Long, pos As Long With hash For i = 0 To bound Step 2 Dim temp As Byte temp = chars(i) If .Exists(temp) Then outVal(pos) = temp pos = pos + 2 End If Next End With ReDim Preserve outVal(pos) HashLookups = outValEnd FunctionPrivate Sub LoadHashTable() Set hash = New Scripting.Dictionary Dim i As Long For i = 48 To 57 hash.Add i, vbNull Next For i = 65 To 90 hash.Add i, vbNull Next For i = 97 To 122 hash.Add i, vbNull NextEnd Sub'Test code: starting = Timer LoadHashTable For i = 1 To 1000000 dummy = HashLookups(sample) Next i Debug.Print "HashLookups: ", Timer - starting
It turned out to be not too shabby:
HashLookups: 1.655273
Final Version
Woke up and thought I'd try a vector lookup instead of a hash lookup (just fill a byte array of values to keep and use that for tests). This seems reasonable in that it's only a 256 element array - basically a truth table:
Private lookup(255) As Boolean 'Module levelFunction VectorLookup(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) Dim bound As Long bound = UBound(chars) ReDim outVal(bound) Dim i As Long, pos As Long For i = 0 To bound Step 2 Dim temp As Byte temp = chars(i) If lookup(temp) Then outVal(pos) = temp pos = pos + 2 End If Next ReDim Preserve outVal(pos) VectorLookup = outValEnd FunctionPrivate Sub GenerateTable() Dim i As Long For i = 48 To 57 lookup(i) = True Next For i = 65 To 90 lookup(i) = True Next For i = 97 To 122 lookup(i) = True NextEnd Sub
Assuming that the lookup table is only generated once, it's clocking in somewhere around 10-15% faster than any other pure VBA method above.
Not sure if this would be faster because it depends on too many factors, but might be worth testing. Instead of Regex.Replace each value separately, you can get the copied Range text from the clipboard and replace all values at once. Note that \w
matches underscore and Unicode letters too, so being more specific in the regular expression can make it faster.
'[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testingDim r As Range, s As StringSet r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") r.Copy .GetFromClipboard Application.CutCopyMode = False s = .GetText .Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text" With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp") .Global = True '.IgnoreCase = False ' .IgnoreCase is False by default .Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters s = .Replace(s, vbNullString) End With .SetText s .PutInClipboardEnd With' about 70% of the time is spent here in pasting the data r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1'Debug.Print Timer - t
I expect this to be slower for less values because of the clipboard overhead, and maybe slower for a lot more values because of the memory needed.
Disabling events didn't seem to make difference in my tests, but might be worth trying.
Note that there is a tiny chance of another application using the clipboard while the macro is using it.
If early binding causes issues from running the same compiled macro on different machines, you can search for macro decompiler or remove the references and switch to late binding.
Credit to ThunderFrame (I'm a sucker for a LHS Mid$
) but I got better performance from the early bound RegExp
with additional small tweaks:
- Use
Value2
rather thanValue
- Declare your loop with long not integer
.ignorecase = True
is redundant
code
Sub Replace2() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("Replace") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant Dim objRegex As VBScript_RegExp_55.RegExp Dim i As Long Set objRegex = CreateObject("vbscript.regexp") With objRegex .Global = True .Pattern = "[^\w]" End With arr = inputRng.Value2 For i = LBound(arr) To UBound(arr) arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString) Next i outputRng.Value2 = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub