What is the fastest way to turn every member of an array alphanumeric? What is the fastest way to turn every member of an array alphanumeric? vba vba

What is the fastest way to turn every member of an array alphanumeric?


tl;dr - Regular expressions destroy VBA implementations. If this were a code challenge, @brettj or @Slai should win it.

There are a bunch of tricks to make your AlphaNumericOnly faster.

First, you can get rid of the vast majority of the function calls by treating it as a byte array instead of a string. That removes all of the calls to Mid$ and Asc. Although these are incredibly fast functions, they still add the overhead pushing onto and popping off of the call stack. That adds up over a couple hundred thousand iterations.

The second optimization is to not use Case x To y syntax if you can avoid it. The reason has to do with how it compiles - it doesn't compile to a test like Case = Condition >= x And Condition <= y, it actually creates a loop with an early exit condition like this:

Case = FalseFor i = x To y    If Condition = i Then        Case = True    End IfNext

Again, not a huge performance hit, but it adds up. The third optimization is to order your tests in a way that makes them sort circuit on the most likely hits in your data set. I tailored my examples below for primarily letters, with most of them upper case. You may do better with different ordering. Put it all together and you get something that looks like this:

Public Function ByteAlphaNumeric(source As Variant) As String    Dim chars() As Byte    Dim outVal() As Byte    chars = CStr(source)        'Load the array up.    Dim bound As Long    bound = UBound(chars)       'Size the outbound array.    ReDim outVal(bound)    Dim i As Long, pos As Long    For i = 0 To bound Step 2   'Wide characters, only care about the ASCII range.        Dim temp As Byte        temp = chars(i)         'Pointer math isn't free. Cache it.        Select Case True        'Order is important here.            Case temp > 64 And temp < 91                outVal(pos) = temp                pos = pos + 2   'Advance the output pointer.            Case temp < 48            Case temp > 122            Case temp > 96                outVal(pos) = temp                pos = pos + 2            Case temp < 58                outVal(pos) = temp                pos = pos + 2        End Select    Next    'This is likely the most expensive operation.    ReDim Preserve outVal(pos)  'Trim the output array.    ByteAlphaNumeric = outValEnd Function

How does it do? Pretty well:

Public Sub Benchmark()    Dim starting As Single, i As Long, dummy As String, sample As Variant    sample = GetRandomString    starting = Timer    For i = 1 To 1000000        dummy = AlphaNumericOnlyOP(sample)    Next i    Debug.Print "OP's AlphaNumericOnly: ", Timer - starting    starting = Timer    For i = 1 To 1000000        dummy = AlphaNumericOnlyThunderframe(sample)    Next i    Debug.Print "ThunderFrame's AlphaNumericOnly: ", Timer - starting    starting = Timer    For i = 1 To 1000000        dummy = AlphaNumeric(sample)    Next i    Debug.Print "CallumDA33's AlphaNumeric: ", Timer - starting    starting = Timer    For i = 1 To 1000000        dummy = ByteAlphaNumeric(sample)    Next i    Debug.Print "ByteAlphaNumeric: ", Timer - starting    Dim cast As String    cast = CStr(sample)    starting = Timer    For i = 1 To 1000000        dummy = ByteAlphaNumericString(cast)    Next i    Debug.Print "ByteAlphaNumericString: ", Timer - starting    Set stripper = Nothing    starting = Timer    For i = 1 To 1000000        dummy = OptimizedRegex(sample)    Next i    Debug.Print "OptimizedRegex: ", Timer - startingEnd SubPrivate Function GetRandomString() As Variant    Dim chars(30) As Byte, i As Long    Randomize    For i = 0 To 30 Step 2        chars(i) = Int(96 * Rnd + 32)    Next i    Dim temp As String    temp = chars    GetRandomString = CVar(temp)End Function

Results with a 15 character random String:

OP`s AlphaNumericOnly:                     6.565918 ThunderFrame`s AlphaNumericOnly:           3.617188 CallumDA33`s AlphaNumeric:                23.518070 ByteAlphaNumeric:                          2.354980

Note, I omitted submissions that weren't trivial to convert to functions. You may notice 2 additional test - the ByteAlphaNumericString is exactly the same as the ByteAlphaNumeric function, but it takes a String as input instead of a Variant and gets rid of the cast. That's not trivial:

ByteAlphaNumericString:                    2.226074

And finally, the elusive OptimizedRegex function (basically @brettj's code in function form for comparison timing):

Private stripper As RegExp  'Module levelFunction OptimizedRegex(strSource As Variant) As String    If stripper Is Nothing Then        Set stripper = New RegExp        With stripper            .Global = True            .Pattern = "[^0-9A-Za-z]"        End With    End If    OptimizedRegex = stripper.Replace(strSource, vbNullString)End Function
OptimizedRegex:                            1.094727 

EDIT: Bonus implementation!

It occurred to me that a hash table lookup might be faster than a Select Case structure, so I built one with using a Scripting.Dictionary:

Private hash As Scripting.Dictionary  'Module levelFunction HashLookups(source As Variant) As String    Dim chars() As Byte    Dim outVal() As Byte    chars = CStr(source)    Dim bound As Long    bound = UBound(chars)    ReDim outVal(bound)    Dim i As Long, pos As Long    With hash        For i = 0 To bound Step 2            Dim temp As Byte            temp = chars(i)            If .Exists(temp) Then                outVal(pos) = temp                pos = pos + 2            End If        Next    End With    ReDim Preserve outVal(pos)    HashLookups = outValEnd FunctionPrivate Sub LoadHashTable()    Set hash = New Scripting.Dictionary    Dim i As Long    For i = 48 To 57        hash.Add i, vbNull    Next    For i = 65 To 90        hash.Add i, vbNull    Next    For i = 97 To 122        hash.Add i, vbNull    NextEnd Sub'Test code:    starting = Timer    LoadHashTable    For i = 1 To 1000000        dummy = HashLookups(sample)    Next i    Debug.Print "HashLookups: ", Timer - starting

It turned out to be not too shabby:

HashLookups:                               1.655273

Final Version

Woke up and thought I'd try a vector lookup instead of a hash lookup (just fill a byte array of values to keep and use that for tests). This seems reasonable in that it's only a 256 element array - basically a truth table:

Private lookup(255) As Boolean 'Module levelFunction VectorLookup(source As Variant) As String    Dim chars() As Byte    Dim outVal() As Byte    chars = CStr(source)    Dim bound As Long    bound = UBound(chars)    ReDim outVal(bound)    Dim i As Long, pos As Long    For i = 0 To bound Step 2        Dim temp As Byte        temp = chars(i)        If lookup(temp) Then            outVal(pos) = temp            pos = pos + 2        End If    Next    ReDim Preserve outVal(pos)    VectorLookup = outValEnd FunctionPrivate Sub GenerateTable()    Dim i As Long    For i = 48 To 57        lookup(i) = True    Next    For i = 65 To 90        lookup(i) = True    Next    For i = 97 To 122        lookup(i) = True    NextEnd Sub

Assuming that the lookup table is only generated once, it's clocking in somewhere around 10-15% faster than any other pure VBA method above.


Not sure if this would be faster because it depends on too many factors, but might be worth testing. Instead of Regex.Replace each value separately, you can get the copied Range text from the clipboard and replace all values at once. Note that \w matches underscore and Unicode letters too, so being more specific in the regular expression can make it faster.

'[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testingDim r As Range, s As StringSet r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")   r.Copy   .GetFromClipboard    Application.CutCopyMode = False    s = .GetText    .Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text"    With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp")        .Global = True        '.IgnoreCase = False ' .IgnoreCase is False by default        .Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters        s = .Replace(s, vbNullString)    End With    .SetText s    .PutInClipboardEnd With' about 70% of the time is spent here in pasting the data r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1'Debug.Print Timer - t

I expect this to be slower for less values because of the clipboard overhead, and maybe slower for a lot more values because of the memory needed.

Disabling events didn't seem to make difference in my tests, but might be worth trying.

Note that there is a tiny chance of another application using the clipboard while the macro is using it.

If early binding causes issues from running the same compiled macro on different machines, you can search for macro decompiler or remove the references and switch to late binding.


Credit to ThunderFrame (I'm a sucker for a LHS Mid$) but I got better performance from the early bound RegExp with additional small tweaks:

  • Use Value2 rather than Value
  • Declare your loop with long not integer
  • .ignorecase = True is redundant

code

    Sub Replace2()    Dim inputSh As Worksheet    Dim inputRng As Range    Set inputSh = Sheets("Data")    Set inputRng = inputSh.Range("A1:A30000")    Dim outputSh As Worksheet    Dim outputRng As Range    Set outputSh = Sheets("Replace")    Set outputRng = outputSh.Range("A1:A30000")    Dim time1 As Double, time2 As Double    time1 = MicroTimer    Dim arr As Variant    Dim objRegex As VBScript_RegExp_55.RegExp    Dim i As Long    Set objRegex = CreateObject("vbscript.regexp")    With objRegex            .Global = True            .Pattern = "[^\w]"    End With    arr = inputRng.Value2    For i = LBound(arr) To UBound(arr)            arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString)    Next i    outputRng.Value2 = arr    time2 = MicroTimer    Debug.Print (time2 - time1) * 1000    End Sub