how to subset a file - select a numbers of rows or columns

Filtering rows is easy, for example with AWK:

cat largefile | awk 'NR >= 10000  && NR <= 100000 { print }'

Filtering columns is easier with CUT:

cat largefile | cut -d '\t' -f 10000-100000

As Rahul Dravid mentioned, cat is not a must here, and as Zsolt Botykai added you can improve performance using:

awk 'NR > 100000 { exit } NR >= 10000 && NR <= 100000' largefilecut -d '\t' -f 10000-100000 largefile

linux unix sed awk cut

Some different solutions:

For row ranges:In sed :

sed -n 10000,100000p somefile.txt

For column ranges in awk:

awk -v f=10000 -v t=100000 '{ for (i=f; i<=t;i++) printf("%s%s", $i,(i==t) ? "\n" : OFS) }' details.txt

linux unix sed awk cut

For the first problem, selecting a set of rows from a large file, piping tail to head is very simple. You want 90000 rows from largefile starting at row 10000. tail grabs the back end of largefile starting at row 10000 and then head chops off all but the first 90000 rows.

tail -n +10000 largefile | head -n 90000 -

CodeHunter

how to subset a file - select a numbers of rows or columns

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last