numpy.genfromtxt- ValueError- Line # (got n columns instead of m)

Looks like you've already read genfromtxt about missing values. Does it say anything about the use of delimiters?

I think it can handle missing values with lines like

'one, 1, 234.4, , ,''two, 3, , 4, 5'

but when the delimiter is the default 'white-space' it can't. One of the first steps after reading a line is

 strings = line.split(delimiter)

And objects if len(strings) doesn't match with the initial target. Apparently it does not try to guess that you want to pad the line with n-len(strings) missing values.

Options that come to mind:

try Pandas; it may make more effort to guess your intentions
write your own reader. Pandas is compiled; genfromtxt is plain numpy Python. It reads the file line by line, splits and converts fields, and appends the list to a master list. It converts that list of lists into array at the end. Your own reader should be just as efficient.
preprocess your file to add the missing values or change the delimiter. genfromtxt accepts anything that feeds it lines. So it works with a list of strings, a file reader that yields modified lines, etc. This may be simplest.
def foo(astr): strs=astr.split() if len(strs)<6: strs.extend([b' ']*(6-len(strs))) return b','.join(strs)

Simulating with a list of strings (in Py3):

In [139]: txt=b"""14        HO2       O3        OH        O2        O2     ...: 15        HO2       HO2       H2O2      O2     ...: 16        H2O2      OH        HO2       H2O     ...: 17        O         O         O2     ...: 18        O         O2        O3     ...: 19        O         O3        O2        O2""".splitlines()In [140]: [foo(l) for l in txt]Out[140]: [b'14,HO2,O3,OH,O2,O2', b'15,HO2,HO2,H2O2,O2, ', b'16,H2O2,OH,HO2,H2O, ', b'17,O,O,O2, , ', b'18,O,O2,O3, , ', b'19,O,O3,O2,O2, ']In [141]: np.genfromtxt([foo(l) for l in txt], dtype=None, delimiter=',')Out[141]: array([(14, b'HO2', b'O3', b'OH', b'O2', b'O2'),       (15, b'HO2', b'HO2', b'H2O2', b'O2', b''),       (16, b'H2O2', b'OH', b'HO2', b'H2O', b''),       (17, b'O', b'O', b'O2', b' ', b''),       (18, b'O', b'O2', b'O3', b' ', b''),       (19, b'O', b'O3', b'O2', b'O2', b'')],       dtype=[('f0', '<i4'), ('f1', 'S4'), ('f2', 'S3'), ('f3', 'S4'), ('f4', 'S3'), ('f5', 'S2')])

python numpy genfromtxt

It looks like your data is nicely aligned in fields of exactly 10 characters. If that is always the case, you can tell genfromtxt the field widths to use by specifying the sequence of field widths in the delimiter argument.

Here's an example.

First, your data file:

In [20]: !cat reaction.dat14        HO2       O3        OH        O2        O215        HO2       HO2       H2O2      O216        H2O2      OH        HO2       H2O17        O         O         O218        O         O2        O319        O         O3        O2        O2

For convenience, I'll define the number of fields and the field width here. (In general, it is not necessary that all the fields have the same width.)

In [21]: numfields = 6In [22]: fieldwidth = 10

Tell genfromtxt that the data is in fixed width columns by passing in the argument delimiter=(10, 10, 10, 10, 10, 10):

In [23]: data = genfromtxt('reaction.dat', dtype='S%d' % fieldwidth, delimiter=(fieldwidth,)*numfields)

Here's the result. Note that "missing" fields are empty strings. Also note that non-empty fields include the white space, and the last non-empty field in each row includes the newline character:

In [24]: dataOut[24]: array([[b'14        ', b'HO2       ', b'O3        ', b'OH        ',        b'O2        ', b'O2\n'],       [b'15        ', b'HO2       ', b'HO2       ', b'H2O2      ',        b'O2\n', b''],       [b'16        ', b'H2O2      ', b'OH        ', b'HO2       ',        b'H2O\n', b''],       [b'17        ', b'O         ', b'O         ', b'O2\n', b'', b''],       [b'18        ', b'O         ', b'O2        ', b'O3\n', b'', b''],       [b'19        ', b'O         ', b'O3        ', b'O2        ',        b'O2\n', b'']],       dtype='|S10')In [25]: data[1]Out[25]: array([b'15        ', b'HO2       ', b'HO2       ', b'H2O2      ', b'O2\n',       b''],       dtype='|S10')

We could clean up the strings in a second step, or we can have genfromtxt do it by providing a converter for each field that simply strips the white space from the field:

In [26]: data = genfromtxt('reaction.dat', dtype='S%d' % fieldwidth, delimiter=(fieldwidth,)*numfields, converters={k: lambda s: s.    ...: strip() for k in range(numfields)})In [27]: dataOut[27]: array([[b'14', b'HO2', b'O3', b'OH', b'O2', b'O2'],       [b'15', b'HO2', b'HO2', b'H2O2', b'O2', b''],       [b'16', b'H2O2', b'OH', b'HO2', b'H2O', b''],       [b'17', b'O', b'O', b'O2', b'', b''],       [b'18', b'O', b'O2', b'O3', b'', b''],       [b'19', b'O', b'O3', b'O2', b'O2', b'']],       dtype='|S10')In [28]: data[:,0]Out[28]: array([b'14', b'15', b'16', b'17', b'18', b'19'],       dtype='|S10')In [29]: data[:,5]Out[29]: array([b'O2', b'', b'', b'', b'', b''],       dtype='|S10')

CodeHunter

numpy.genfromtxt- ValueError- Line # (got n columns instead of m)

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last