split file on Nth occurrence of delimiter

Using awk you could:

awk '/^\+$/ { delim++ } { file = sprintf("chunk%s.txt", int(delim / 50000)); print >> file; }' < input.txt

Update:

To not include the delimiter, try this:

awk '/^\+$/ { if(++delim % 50000 == 0) { next } } { file = sprintf("chunk%s.txt", int(delim / 50000)); print > file; }' < input.txt

The next keyword causes awk to halt processing rules for this record and and advance to the next (line). I also changed the >> to > since if you run it more than once you probably don't want to append the old chunk files.

file unix split chunking

It isn't very hard to do in Perl if you can't find a suitable alternative (and it will perform pretty well):

#!/usr/bin/env perluse strict;use warnings;# Configuration items - could be set by argument handlingmy $prefix = "rs.";     # File prefixmy $number = 1;         # First file numbermy $width  = 4;         # Number of digits to use in file namemy $rx     = qr/^\+$/;  # Match regexmy $limit  = 3;         # 50,000 in real casemy $quiet  = 0;         # Set to 1 to suppress file namessub next_file{    my $name = sprintf("%s%.*d", $prefix, $width, $number++);    open my $fh, '>', $name or die "Failed to open $name for writing";    print "$name\n" unless $quiet;    return $fh;}my $fh = next_file;  # Output file handlemy $counter = 0;     # Match counterwhile (<>){    print $fh $_;    $counter++ if (m/$rx/);    if ($counter >= $limit)    {        close $fh;        $fh = next_file;        $counter = 0;    }}close $fh;

That's far from being a one-liner; I'm not sure whether that's a merit or not. The items that should be configured are grouped together, and could be set via command line options, for example.You could end up with an empty file; you could spot that and remove it if necessary. You'd need a second counter; the existing one is a 'match counter' but you'd also need a line counter, and if the line counter was zero at the you'd remove the last file. You'd also need the name to be able to remove it...fiddly, but not difficult.

Give the input (basically two copies of your sample data), the output from repsplit.pl (repeat split) was as shown:

$ perl repsplit.pl datars.0001rs.0002rs.0003$ cat dataentry 1some more+entry 2some moreeven more+entry 3some more+entry 4some more+entry 1some more+entry 2some moreeven more+entry 3some more+entry 4some more+$ cat rs.0001entry 1some more+entry 2some moreeven more+entry 3some more+$ cat rs.0002entry 4some more+entry 1some more+entry 2some moreeven more+$ cat rs.0003entry 3some more+entry 4some more+$

file unix split chunking

Using perl and + as input separator in a concise "one-liner" :

If you'd like to do $_ > newprefix.part.$c like stated in your comment :

$ limit=50000 perl -053 -Mautodie -lne '    BEGIN{$\=""}    $count++;    if ($count >= $ENV{limit}) {        open my $fh, ">", "newprefix.part.$c";        print $fh $_;        close $fh;    }' file.txt$ ls -l newprefix.part.*

CodeHunter

split file on Nth occurrence of delimiter

Doc

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last