_csv.Error: field larger than field limit (131072)

python csv

The csv file might contain very huge fields, therefore increase the field_size_limit:

import sysimport csvcsv.field_size_limit(sys.maxsize)

sys.maxsize works for Python 2.x and 3.x. sys.maxint would only work with Python 2.x (SO: what-is-sys-maxint-in-python-3)

Update

As Geoff pointed out, the code above might result in the following error: OverflowError: Python int too large to convert to C long. To circumvent this, you could use the following quick and dirty code (which should work on every system with Python 2 and Python 3):

import sysimport csvmaxInt = sys.maxsizewhile True:    # decrease the maxInt value by factor 10     # as long as the OverflowError occurs.    try:        csv.field_size_limit(maxInt)        break    except OverflowError:        maxInt = int(maxInt/10)

python csv

This could be because your CSV file has embedded single or double quotes. If your CSV file is tab-delimited try opening it as:

c = csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE)

python csv

.csv field sizes are controlled via [Python 3.Docs]: csv.field_size_limit([new_limit]) (emphasis is mine):

Returns the current maximum field size allowed by the parser. If new_limit is given, this becomes the new limit.

It is set by default to 131072 or 0x20000 (128k), which should be enough for any decent .csv:

>>> import csv>>>>>>>>> limit0 = csv.field_size_limit()>>> limit0131072>>> "0x{0:016X}".format(limit0)'0x0000000000020000'

However, when dealing with a .csv file (with the correct quoting and delimiter) having (at least) one field longer than this size, the error pops up.
To get rid of the error, the size limit should be increased (to avoid any worries, the maximum possible value is attempted).

Behind the scenes (check [GitHub]: python/cpython - (master) cpython/Modules/_csv.c for implementation details), the variable that holds this value is a C long ([Wikipedia]: C data types), whose size varies depending on CPU architecture and OS (ILP). The classical difference: for a 64bit OS (and Python build), the long type size (in bits) is:

Nix: 64
Win: 32

When attempting to set it, the new value is checked to be in the long boundaries, that's why in some cases another exception pops up (because sys.maxsize is typically 64bit wide - encountered on Win):

>>> import sys, ctypes as ct>>>>>>>>> sys.platform, sys.maxsize, ct.sizeof(ct.c_void_p) * 8, ct.sizeof(ct.c_long) * 8('win32', 9223372036854775807, 64, 32)>>>>>> csv.field_size_limit(sys.maxsize)Traceback (most recent call last):  File "<stdin>", line 1, in <module>OverflowError: Python int too large to convert to C long

To avoid running into this problem, set the (maximum possible) limit (LONG_MAX), using an artifice (thanks to [Python 3.Docs]: ctypes - A foreign function library for Python). It should work on Python 3 and Python 2, on any CPU / OS.

>>> csv.field_size_limit(int(ct.c_ulong(-1).value // 2))131072>>> limit1 = csv.field_size_limit()>>> limit12147483647>>> "0x{0:016X}".format(limit1)'0x000000007FFFFFFF'

64bit Python on a Nix like OS:

>>> import sys, csv, ctypes as ct>>>>>>>>> sys.platform, sys.maxsize, ct.sizeof(ct.c_void_p) * 8, ct.sizeof(ct.c_long) * 8('linux', 9223372036854775807, 64, 64)>>>>>> csv.field_size_limit()131072>>>>>> csv.field_size_limit(int(ct.c_ulong(-1).value // 2))131072>>> limit1 = csv.field_size_limit()>>> limit19223372036854775807>>> "0x{0:016X}".format(limit1)'0x7FFFFFFFFFFFFFFF'

For 32bit Python, things should run smoothly without the artifice (as both sys.maxsize and LONG_MAX are 32bit wide).
If this maximum value is still not enough, then the .csv would need manual intervention in order to be processed from Python.

Check the following resources for more details on:

Playing with C types boundaries from Python: [SO]: Maximum and minimum value of C types integers from Python (@CristiFati's answer)
Python 32bit vs 64bit differences: [SO]: How do I determine if my python shell is executing in 32bit or 64bit mode on OS X? (@CristiFati's answer)

CodeHunter

_csv.Error: field larger than field limit (131072)

Update

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last