Bash: Loop Read N lines at time from CSV Bash: Loop Read N lines at time from CSV unix unix

Bash: Loop Read N lines at time from CSV


I would use the split command and write a little shell script around it:

#!/bin/bashinput_file=ids.txttemp_dir=splitsapi_limit=10000# Make sure that there are no leftovers from previous runsrm -rf "${temp_dir}"# Create temporary folder for splitting the filemkdir "${temp_dir}"# Split the input file based on the api limitsplit --lines "${api_limit}" "${input_file}" "${temp_dir}/"# Iterate through splits and make an api call per splitfor split in "${temp_dir}"/* ; do    jq -Rsn '        {"id":          [inputs            | . / "\n"            | (.[] | select(length > 0) | . / ";") as $input            | $input[0]]        }' "${split}" > api_payload.json    # now do something ...    # curl -dapi_payload.json http://...    rm -f api_payload.jsondone# Clean uprm -rf "${temp_dir}"


Here's a simple and efficient solution that at its core just uses jq. It takes advantage of the -c command-line option. I've used xargs printf ... for illustration - mainly to show how easy it is to set up a shell pipeline.

< data.txt jq -Rnc '  def batch($n; stream):    def b: [limit($n; stream)]    | select(length > 0)    | (., b);    b;  {id: batch(10000; inputs | select(length>0) | (. / ";")[0])}' | xargs printf "%s\n"

Parameterizing batch size

It might make sense to set things up so that the batch size is specified outside the jq program. This could be done in numerous ways, e.g. by invoking jq along the lines of:

jq --argjson n 10000 ....

and of course using $n instead of 10000 in the jq program.

Why “def b:”?

For efficiency. jq’s TCO (tail recursion optimization) only works for arity-0 filters.

Note on -s

In the Q as originally posted, the command-line options -sn are used in conjunction with inputs. Using -s with inputs defeats the whole purpose of inputs, which is to make it possible to process input in a stream-oriented way (i.e. one line of input or one JSON entity at a time).