Python: skip comment lines marked with # in csv.DictReader
Good question. Python's CSV library lacks basic support for comments (not uncommon at the top of CSV files). While Dan Stowell's solution works for the specific case of the OP, it is limited in that #
must appear as the first symbol. A more generic solution would be:
def decomment(csvfile): for row in csvfile: raw = row.split('#')[0].strip() if raw: yield rawwith open('dummy.csv') as csvfile: reader = csv.reader(decomment(csvfile)) for row in reader: print(row)
As an example, the following dummy.csv
file:
# comment # commenta,b,c # comment1,2,310,20,30# comment
returns
['a', 'b', 'c']['1', '2', '3']['10', '20', '30']
Of course, this works just as well with csv.DictReader()
.
Another way to read a CSV file is using pandas
Here's a sample code:
df = pd.read_csv('test.csv', sep=',', # field separator comment='#', # comment index_col=0, # number or label of index column skipinitialspace=True, skip_blank_lines=True, error_bad_lines=False, warn_bad_lines=True ).sort_index()print(df)df.fillna('no value', inplace=True) # replace NaN with 'no value'print(df)
For this csv file:
a,b,c,d,e1,,16,,55#,,65##778,77,77,,16#86,18##This is a comment13,19,25,28,82
we will get this output:
b c d ea 1 NaN 16 NaN 558 77.0 77 NaN 1613 19.0 25 28.0 82 b c d ea 1 no value 16 no value 558 77 77 no value 1613 19 25 28 82