Interleave text files with a given ratio of lines from file1 to file2

Good old paste:

paste -d '\n' fsmall - - - <fbig

SYNOPSIS paste [-s] [-d list] file ... file
OPERANDS file: A pathname of an input file. If - is specified for one or more of the files, the standard input shall be used; the standard input shall be read one line at a time, circularly, for each instance of -.
_{source: POSIX paste}

This means, each <hyphen>-character reads a line from stdin, which is defined to be fbig in this case. Three hyphens, means three lines.

Good old awk without buffering:

awk -v r=3 '1;{for(i=1;i<=r;++i) {getline < "-"; print}}' fsmall <fbig

This method mimicks the idea of the paste-solution. It uses getline to avoid buffering of the small file. This is not really flexible and one should always be careful when using getline [See All about getline]

Good old awk with buffering:

awk -v r=3 '(NR==FNR){b[FNR]=$0;next}(FNR%r==1){print b[++c]}1' fsmall fbig

This buffers the small file. This could lead to performance issues when the small file is really big. (See the comment of Tripleee)

bash shell text awk

With GNU sed

sed -e 'R f2' -e 'R f2' -e 'R f2' f1

where f1 is the smaller file. The R command reads one line at a time from the given file. The lines thus obtained gets appended after the current line that's read from f1

bash shell text awk

Repeatedly reopening each input file and seeking to the spot where you last stopped reading is horribly inefficient. Making matters worse, you are reading the entire input file through to the end each time, and just picking out one line or three along the way. You could at least exit as soon as you have printed the stuff you wanted. But hang on.

Here is a simple Python script which does what you are asking for by simply keeping both files open and reading from each as you go.

with open('small_file.txt') as small, open('big_file.txt') as large:    for line in small:        print(line, end='')        for x in range(3):            print(large.readline(), end='')

If you would like to parametrize the file names, try

import syswith open(sys.argv[1]) as small, open(sys.argv[2]) as large:    ...

Output is to standard output, so if you saved the above into path/to/script.py you can simply run this at the shell prompt:

python3 path/to/script.py small_file.txt big_file.txt >Output.txt

The use of end='' is a minor hack to avoid having to pluck off the newline and have print add it back.

As an afterthought, you can do much the same thing in a shell script;

while IFS= read -r line; do    printf '%s\n' "$line"    for x in 1 2 3; do        IFS= read -u 3 -r other        printf '%s\n' "$other"    donedone <small_file.txt 3<big_file.txt >Output.txt

but the shell's while read -r loop is inherently much slower.

CodeHunter

Interleave text files with a given ratio of lines from file1 to file2

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last