Pipe a lot of files to stdin, extract first columns, then combine those in a new file Pipe a lot of files to stdin, extract first columns, then combine those in a new file shell shell

Pipe a lot of files to stdin, extract first columns, then combine those in a new file


Process substitution <(somecommand) doesn't pipe to stdin, it actually opens a pipe on a separate file descriptor, e.g. 63, and passes in /dev/fd/63. When this "file" is opened, the kernel* duplicates the fd instead of opening a real file.

We can do something similar by opening a bunch of file descriptors and then passing them to the command:

# Start subshell so all files are automatically closed(  fds=()  n=0  # Open a new fd for each process subtitution  for file in ./*.txt  do    exec {fds[n++]}< <(cut -d ' ' -f 1 "$file")  done  # fds now contain a list of fds like 12 14  # prepend "/dev/fd/" to all of them  parameters=( "${fds[@]/#//dev/fd/}" )  paste -d ' ' "${parameters[@]}")

{var}< file is bash's syntax for dynamic file descriptor assignment. like var=4; exec 4< file; but without having to hardcode the 4 and instead let bash pick a free file descriptor. exec opens it in the current shell.

* Linux, FreeBSD, OpenBSD and XNU/OSX anyways. This is not POSIX, but neither is <(..)


Given space delimited input files, and provided ':' is a safe delimiter, (i.e. if there are no colons in the input), this paste to sed one-liner works:

paste -d':' *.txt | sed 's/ [^:]*$//;s/ [^:]*:*/ /g;s/://g'

(POSIX, with no eval, exec, bashisms, subshells, or loops.)


After a closer look, I see that @that-other-guy's answer is awesome, but here also is another dirty dirty way that's roughly the same under the hood.

eval "paste -d' ' "$(find *.txt -printf " <(cut -d' ' -f1 '%f')")