How to extract one column from multiple files, and paste those columns into one file?
Here's one way using awk
and a sorted glob of files:
awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)
Results:
1 8 a2 9 b3 10 c4 11 d5 12 e6 13 f7 14 g
Explanation:
For each line of input of each input file:
Add the files line number to an array with a value of column 5.
(a[FNR] ? a[FNR] FS : "")
is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.
At the end of the script:
- Use a C-style loop to iterate through the array, printing each of the arrays values.
# print filenames in sorted orderfind -name sample\*.txt | sort |# extract 5-th column from each file and print it on a single linexargs -n1 -I{} sh -c '{ cut -s -d " " -f 5 $0 | tr "\n" " "; echo; }' {} |# transposepython transpose.py ?
where transpose.py
:
#!/usr/bin/env python"""Write lines from stdin as columns to stdout."""import sysfrom itertools import izip_longestmissing_value = sys.argv[1] if len(sys.argv) > 1 else '-'for row in izip_longest(*[column.split() for column in sys.stdin], fillvalue=missing_value): print " ".join(row)
Output
1 8 a2 9 b3 10 c4 11 d5 ? e6 ? f? ? g
Assuming the first and second files have less lines than the third one (missing values are replaced by '?'
).