Interleave text files with a given ratio of lines from file1 to file2 Interleave text files with a given ratio of lines from file1 to file2 shell shell

Interleave text files with a given ratio of lines from file1 to file2


Good old paste:

paste -d '\n' fsmall - - - <fbig

SYNOPSIS paste [-s] [-d list] file ... file
OPERANDS file: A pathname of an input file. If - is specified for one or more of the files, the standard input shall be used; the standard input shall be read one line at a time, circularly, for each instance of -.

source: POSIX paste

This means, each <hyphen>-character reads a line from stdin, which is defined to be fbig in this case. Three hyphens, means three lines.

Good old awk without buffering:

awk -v r=3 '1;{for(i=1;i<=r;++i) {getline < "-"; print}}' fsmall <fbig

This method mimicks the idea of the paste-solution. It uses getline to avoid buffering of the small file. This is not really flexible and one should always be careful when using getline [See All about getline]

Good old awk with buffering:

awk -v r=3 '(NR==FNR){b[FNR]=$0;next}(FNR%r==1){print b[++c]}1' fsmall fbig

This buffers the small file. This could lead to performance issues when the small file is really big. (See the comment of Tripleee)


With GNU sed

sed -e 'R f2' -e 'R f2' -e 'R f2' f1

where f1 is the smaller file. The R command reads one line at a time from the given file. The lines thus obtained gets appended after the current line that's read from f1


Repeatedly reopening each input file and seeking to the spot where you last stopped reading is horribly inefficient. Making matters worse, you are reading the entire input file through to the end each time, and just picking out one line or three along the way. You could at least exit as soon as you have printed the stuff you wanted. But hang on.

Here is a simple Python script which does what you are asking for by simply keeping both files open and reading from each as you go.

with open('small_file.txt') as small, open('big_file.txt') as large:    for line in small:        print(line, end='')        for x in range(3):            print(large.readline(), end='')

If you would like to parametrize the file names, try

import syswith open(sys.argv[1]) as small, open(sys.argv[2]) as large:    ...

Output is to standard output, so if you saved the above into path/to/script.py you can simply run this at the shell prompt:

python3 path/to/script.py small_file.txt big_file.txt >Output.txt

The use of end='' is a minor hack to avoid having to pluck off the newline and have print add it back.

As an afterthought, you can do much the same thing in a shell script;

while IFS= read -r line; do    printf '%s\n' "$line"    for x in 1 2 3; do        IFS= read -u 3 -r other        printf '%s\n' "$other"    donedone <small_file.txt 3<big_file.txt >Output.txt

but the shell's while read -r loop is inherently much slower.