Parse CSV, ignoring commas inside string literals in VBA?
The first way to solve this problem is to look at the structure of the line from the csv file (int,int,"String literal, will have at most one comma", etc).A naive solution would be (Assuming that the line don't have any semicolons)
Function splitLine1(line As String) As String() Dim temp() As String 'Splits the line in three. The string delimited by " will be at temp(1) temp = Split(line, Chr(34)) 'chr(34) = " 'Replaces the commas in the numeric fields by semicolons temp(0) = Replace(temp(0), ",", ";") temp(2) = Replace(temp(2), ",", ";") 'Joins the temp array with quotes and then splits the result using the semicolons splitLine1 = Split(Join(temp, Chr(34)), ";")End Function
This function only solves this particular problem.Another way to do the job is using the regular expression object from VBScript.
Function splitLine2(line As String) As String() Dim regex As Object Set regex = CreateObject("vbscript.regexp") regex.IgnoreCase = True regex.Global = True 'This pattern matches only commas outside quotes 'Pattern = ",(?=([^"]*"[^"]*")*(?![^"]*"))" regex.Pattern = ",(?=([^" & Chr(34) & "]*" & Chr(34) & "[^" & Chr(34) & "]*" & Chr(34) & ")*(?![^" & Chr(34) & "]*" & Chr(34) & "))" 'regex.replaces will replace the commas outside quotes with semicolons and then the 'Split function will split the result based on the semicollons splitLine2 = Split(regex.Replace(line, ";"), ";")End Function
This method seems much more cryptic, but does not deppends on the structure of the line
You can read more about regular expressions patterns in VBScript Here
@Gimp said...
The current answers do not contain enough detail.
I'm running into the same problem. Looking for more detail in this answer.
To elaborate on @MRAB's answer:
Function ParseCSV(FileName) Dim Regex 'As VBScript_RegExp_55.RegExp Dim MatchColl 'As VBScript_RegExp_55.MatchCollection Dim Match 'As VBScript_RegExp_55.Match Dim FS 'As Scripting.FileSystemObject Dim Txt 'As Scripting.TextStream Dim CSVLine ReDim ToInsert(0) Set FS = CreateObject("Scripting.FileSystemObject") Set Txt = FS.OpenTextFile(FileName, 1, False, -2) Set Regex = CreateObject("VBScript.RegExp") Regex.Pattern = """[^""]*""|[^,]*" '<- MRAB's answer Regex.Global = True Do While Not Txt.AtEndOfStream ReDim ToInsert(0) CSVLine = Txt.ReadLine For Each Match In Regex.Execute(CSVLine) If Match.Length > 0 Then ReDim Preserve ToInsert(UBound(ToInsert) + 1) ToInsert(UBound(ToInsert) - 1) = Match.Value End If Next InsertArrayIntoDatabase ToInsert Loop Txt.CloseEnd Function
You need to customize the InsertArrayIntoDatabase Sub for your own table. Mine has several text fields named f00, f01, etc...
Sub InsertArrayIntoDatabase(a()) Dim rs As DAO.Recordset Dim i, n Set rs = CurrentDb().TableDefs("tbl").OpenRecordset() rs.AddNew For i = LBound(a) To UBound(a) n = "f" & Format(i, "00") 'fields in table are f00, f01, f02, etc.. rs.Fields(n) = a(i) Next rs.UpdateEnd Sub
Note that instead of using CurrentDb()
in InsertArrayIntoDatabase()
, you should really use a global variable that gets set to the value of CurrentDb()
before ParseCSV()
runs, because running CurrentDb()
in a loop is very slow, especially on a very large file.