how to subset a file - select a numbers of rows or columns
Filtering rows is easy, for example with AWK:
cat largefile | awk 'NR >= 10000 && NR <= 100000 { print }'
Filtering columns is easier with CUT:
cat largefile | cut -d '\t' -f 10000-100000
As Rahul Dravid mentioned, cat
is not a must here, and as Zsolt Botykai added you can improve performance using:
awk 'NR > 100000 { exit } NR >= 10000 && NR <= 100000' largefilecut -d '\t' -f 10000-100000 largefile
For the first problem, selecting a set of rows from a large file, piping tail to head is very simple. You want 90000 rows from largefile starting at row 10000. tail grabs the back end of largefile starting at row 10000 and then head chops off all but the first 90000 rows.
tail -n +10000 largefile | head -n 90000 -