large amount of data in many text files - how to process?

python sql r large-files large-data-volumes

(3) is not necessarily a bad idea -- Python makes it easy to process "CSV" file (and despite the C standing for Comma, tab as a separator is just as easy to handle) and of course gets just about as much bandwidth in I/O ops as any other language. As for other recommendations, numpy, besides fast computation (which you may not need as per your statements) provides very handy, flexible multi-dimensional arrays, which may be quite handy for your tasks; and the standard library module multiprocessing lets you exploit multiple cores for any task that's easy to parallelize (important since just about every machine these days has multi-cores;-).

python sql r large-files large-data-volumes

Ok, so just to be different, why not R?

You seem to know R so you may get to working code quickly
30 mb per file is not large on standard workstation with a few gb of ram
the read.csv() variant of read.table() can be very efficient if you specify the types of columns via the colClasses argument: instead of guestimating types for conversion, these will handled efficiently
the bottleneck here is i/o from the disk and that is the same for every language
R has multicore to set up parallel processing on machines with multiple core (similar to Python's multiprocessing, it seems)
Should you want to employ the 'embarrassingly parallel' structure of the problem, R has several packages that are well-suited to data-parallel problems: E.g. snow and foreach can each be deployed on just one machine, or on a set of networked machines.

python sql r large-files large-data-volumes

Have a look at Disco. It is a lightweight distributed MapReduce engine, written in about 2000 lines of Erlang, but specifically designed for Python development. It supports not only working on your data, but also storing an replication reliably. They've just released version 0.3, which includes an indexing and database layer.

CodeHunter

large amount of data in many text files - how to process?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last