Interleave text files with a given ratio of lines from file1 to file2
Good old paste
:
paste -d '\n' fsmall - - - <fbig
SYNOPSIS
paste [-s] [-d list] file ... file
OPERANDS file: A pathname of an input file. If-
is specified for one or more of the files, the standard input shall be used; the standard input shall be read one line at a time, circularly, for each instance of-
.source: POSIX paste
This means, each <hyphen>-character reads a line from stdin
, which is defined to be fbig
in this case. Three hyphens, means three lines.
Good old awk
without buffering:
awk -v r=3 '1;{for(i=1;i<=r;++i) {getline < "-"; print}}' fsmall <fbig
This method mimicks the idea of the paste
-solution. It uses getline
to avoid buffering of the small file. This is not really flexible and one should always be careful when using getline
[See All about getline]
Good old awk
with buffering:
awk -v r=3 '(NR==FNR){b[FNR]=$0;next}(FNR%r==1){print b[++c]}1' fsmall fbig
This buffers the small file. This could lead to performance issues when the small file is really big. (See the comment of Tripleee)
Repeatedly reopening each input file and seeking to the spot where you last stopped reading is horribly inefficient. Making matters worse, you are reading the entire input file through to the end each time, and just picking out one line or three along the way. You could at least exit
as soon as you have printed the stuff you wanted. But hang on.
Here is a simple Python script which does what you are asking for by simply keeping both files open and reading from each as you go.
with open('small_file.txt') as small, open('big_file.txt') as large: for line in small: print(line, end='') for x in range(3): print(large.readline(), end='')
If you would like to parametrize the file names, try
import syswith open(sys.argv[1]) as small, open(sys.argv[2]) as large: ...
Output is to standard output, so if you saved the above into path/to/script.py
you can simply run this at the shell prompt:
python3 path/to/script.py small_file.txt big_file.txt >Output.txt
The use of end=''
is a minor hack to avoid having to pluck off the newline and have print
add it back.
As an afterthought, you can do much the same thing in a shell script;
while IFS= read -r line; do printf '%s\n' "$line" for x in 1 2 3; do IFS= read -u 3 -r other printf '%s\n' "$other" donedone <small_file.txt 3<big_file.txt >Output.txt
but the shell's while read -r
loop is inherently much slower.