Can I import a CSV file and automatically infer the delimiter?

The csv module seems to recommend using the csv sniffer for this problem.

They give the following example, which I've adapted for your case.

with open('example.csv', 'rb') as csvfile:  # python 3: 'r',newline=""    dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=";,")    csvfile.seek(0)    reader = csv.reader(csvfile, dialect)    # ... process CSV file contents here ...

Let's try it out.

[9:13am][wlynch@watermelon /tmp] cat example #!/usr/bin/env pythonimport csvdef parse(filename):    with open(filename, 'rb') as csvfile:        dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';,')        csvfile.seek(0)        reader = csv.reader(csvfile, dialect)        for line in reader:            print linedef main():    print 'Comma Version:'    parse('comma_separated.csv')    print    print 'Semicolon Version:'    parse('semicolon_separated.csv')    print    print 'An example from the question (kingdom.csv)'    parse('kingdom.csv')if __name__ == '__main__':    main()

And our sample inputs

[9:13am][wlynch@watermelon /tmp] cat comma_separated.csv test,box,fooround,the,bend[9:13am][wlynch@watermelon /tmp] cat semicolon_separated.csv round;the;bendwho;are;you[9:22am][wlynch@watermelon /tmp] cat kingdom.csv ReleveAnnee;ReleveMois;NoOrdre;TitreRMC;AdopCSRegleVote;AdopCSAbs;AdoptCSContre;NoCELEX;ProposAnnee;ProposChrono;ProposOrigine;NoUniqueAnnee;NoUniqueType;NoUniqueChrono;PropoSplittee;Suite2LecturePE;Council PATH;Notes1999;1;1;1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC;U;;;31999D0083;1998;577;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document1999;1;2;1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes;U;;;31999D0081;1998;184;COM;NULL;CS;NULL;;;;Propos* are missing on Celex document

And if we execute the example program:

[9:14am][wlynch@watermelon /tmp] ./example Comma Version:['test', 'box', 'foo']['round', 'the', 'bend']Semicolon Version:['round', 'the', 'bend']['who', 'are', 'you']An example from the question (kingdom.csv)['ReleveAnnee', 'ReleveMois', 'NoOrdre', 'TitreRMC', 'AdopCSRegleVote', 'AdopCSAbs', 'AdoptCSContre', 'NoCELEX', 'ProposAnnee', 'ProposChrono', 'ProposOrigine', 'NoUniqueAnnee', 'NoUniqueType', 'NoUniqueChrono', 'PropoSplittee', 'Suite2LecturePE', 'Council PATH', 'Notes']['1999', '1', '1', '1999/83/EC: Council Decision of 18 January 1999 authorising the Kingdom of Denmark to apply or to continue to apply reductions in, or exemptions from, excise duties on certain mineral oils used for specific purposes, in accordance with the procedure provided for in Article 8(4) of Directive 92/81/EEC', 'U', '', '', '31999D0083', '1998', '577', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']['1999', '1', '2', '1999/81/EC: Council Decision of 18 January 1999 authorising the Kingdom of Spain to apply a measure derogating from Articles 2 and 28a(1) of the Sixth Directive (77/388/EEC) on the harmonisation of the laws of the Member States relating to turnover taxes', 'U', '', '', '31999D0081', '1998', '184', 'COM', 'NULL', 'CS', 'NULL', '', '', '', 'Propos* are missing on Celex document']

It's also probably worth noting what version of python I'm using.

[9:20am][wlynch@watermelon /tmp] python -VPython 2.7.2

python file csv import delimiter

Given a project that deals with both , (comma) and | (vertical bar) delimited CSV files, which are well formed, I tried the following (as given at https://docs.python.org/2/library/csv.html#csv.Sniffer):

dialect = csv.Sniffer().sniff(csvfile.read(1024), delimiters=',|')

However, on a |-delimited file, the "Could not determine delimiter" exception was returned. It seemed reasonable to speculate that the sniff heuristic might work best if each line has the same number of delimiters (not counting whatever might be enclosed in quotes). So, instead of reading the first 1024 bytes of the file, I tried reading the first two lines in their entirety:

temp_lines = csvfile.readline() + '\n' + csvfile.readline()dialect = csv.Sniffer().sniff(temp_lines, delimiters=',|')

So far, this is working well for me.

python file csv import delimiter

To solve the problem, I have created a function which reads the first line of a file (header) and detects the delimiter.

def detectDelimiter(csvFile):    with open(csvFile, 'r') as myCsvfile:        header=myCsvfile.readline()        if header.find(";")!=-1:            return ";"        if header.find(",")!=-1:            return ","    #default delimiter (MS Office export)    return ";"

CodeHunter

Can I import a CSV file and automatically infer the delimiter?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last