Parse CSV, ignoring commas inside string literals in VBA? Parse CSV, ignoring commas inside string literals in VBA? vba vba

Parse CSV, ignoring commas inside string literals in VBA?


The first way to solve this problem is to look at the structure of the line from the csv file (int,int,"String literal, will have at most one comma", etc).A naive solution would be (Assuming that the line don't have any semicolons)

Function splitLine1(line As String) As String()   Dim temp() As String   'Splits the line in three. The string delimited by " will be at temp(1)   temp = Split(line, Chr(34)) 'chr(34) = "   'Replaces the commas in the numeric fields by semicolons   temp(0) = Replace(temp(0), ",", ";")   temp(2) = Replace(temp(2), ",", ";")   'Joins the temp array with quotes and then splits the result using the semicolons   splitLine1 = Split(Join(temp, Chr(34)), ";")End Function

This function only solves this particular problem.Another way to do the job is using the regular expression object from VBScript.

Function splitLine2(line As String) As String()    Dim regex As Object    Set regex = CreateObject("vbscript.regexp")    regex.IgnoreCase = True    regex.Global = True    'This pattern matches only commas outside quotes    'Pattern = ",(?=([^"]*"[^"]*")*(?![^"]*"))"    regex.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))"    'regex.replaces will replace the commas outside quotes with semicolons and then the    'Split function will split the result based on the semicollons    splitLine2 = Split(regex.Replace(line, ";"), ";")End Function

This method seems much more cryptic, but does not deppends on the structure of the line

You can read more about regular expressions patterns in VBScript Here


A simple regex for parsing a CSV line, assuming no quotes inside quoted fields, is:

"[^"]*"|[^,]*

Each match will return a field.


@Gimp said...

The current answers do not contain enough detail.

I'm running into the same problem. Looking for more detail in this answer.

To elaborate on @MRAB's answer:

Function ParseCSV(FileName)    Dim Regex       'As VBScript_RegExp_55.RegExp    Dim MatchColl   'As VBScript_RegExp_55.MatchCollection    Dim Match       'As VBScript_RegExp_55.Match    Dim FS          'As Scripting.FileSystemObject    Dim Txt         'As Scripting.TextStream    Dim CSVLine    ReDim ToInsert(0)    Set FS = CreateObject("Scripting.FileSystemObject")    Set Txt = FS.OpenTextFile(FileName, 1, False, -2)    Set Regex = CreateObject("VBScript.RegExp")    Regex.Pattern = """[^""]*""|[^,]*"    '<- MRAB's answer    Regex.Global = True    Do While Not Txt.AtEndOfStream        ReDim ToInsert(0)        CSVLine = Txt.ReadLine        For Each Match In Regex.Execute(CSVLine)            If Match.Length > 0 Then                ReDim Preserve ToInsert(UBound(ToInsert) + 1)                ToInsert(UBound(ToInsert) - 1) = Match.Value            End If        Next        InsertArrayIntoDatabase ToInsert    Loop    Txt.CloseEnd Function

You need to customize the InsertArrayIntoDatabase Sub for your own table. Mine has several text fields named f00, f01, etc...

Sub InsertArrayIntoDatabase(a())    Dim rs As DAO.Recordset    Dim i, n    Set rs = CurrentDb().TableDefs("tbl").OpenRecordset()    rs.AddNew    For i = LBound(a) To UBound(a)        n = "f" & Format(i, "00") 'fields in table are f00, f01, f02, etc..        rs.Fields(n) = a(i)    Next    rs.UpdateEnd Sub

Note that instead of using CurrentDb() in InsertArrayIntoDatabase(), you should really use a global variable that gets set to the value of CurrentDb() before ParseCSV() runs, because running CurrentDb() in a loop is very slow, especially on a very large file.