Bash alias to automatically detect arbitrarily named file sequences?

This is one way of doing something like that with awk. Code is pretty unreadable though:

#!/bin/bashls | awk 'function smprint() {    if ((a[1]!=exA1) || (a[2] != exA2+1)) {        if ((exA1) && (exA1==exexA1)) print "\t.. " exfile;        else printf linesep;        if ($0!=exfile) printf $0;    }};BEGIN { d="[0-9]"; rg="(.*)(" d d d d ")(.*)"; };{    split(gensub(rg, "\\1####\\3\t\\2", "g"), a, "\t");    # produces e.g.: a[1]="file####.ext" a[2]="0001"    smprint();    linesep="\n";    exexA1=exA1; # old old a[1]    exA1=a[1]; # old a[1]    exA2=a[2]; # old a[2]    exfile=$0; # old filename};END {    smprint();}'

Comparing the output of ls and the script above on the same folder:

etuardu@subranu:~/Desktop/pippo$ lsasd1234_0001.tar.bz2    filename_v003_0006.geo  script.shasd1234_0002.tar.bz2    filename_v003_0007.geo  testxxtest.0057.exrasd1234_0003.tar.bz2    filename_v003_0032.geo  testxxtest.0058.exrfilename_v003_0001.geo  filename_v003_0033.geo  testxxtest.0059.exrfilename_v003_0002.geo  filename_v003_0034.geo  testxxtest.0060.exrfilename_v003_0003.geo  filename_v003_0035.geo  testxxtest.0061.exrfilename_v003_0004.geo  filename_v003_0036.geo  testxxtest.0062.exrfilename_v003_0005.geo  other_file              testxxtest.0063.exretuardu@subranu:~/Desktop/pippo$ ./script.sh asd1234_0001.tar.bz2    .. asd1234_0003.tar.bz2filename_v003_0001.geo  .. filename_v003_0007.geofilename_v003_0032.geo  .. filename_v003_0036.geoother_filescript.shtestxxtest.0057.exr .. testxxtest.0063.exretuardu@subranu:~/Desktop/pippo$

If you mind to stick to the syntax you provided in the example, you can pipe this output to sed. With some regex magic you have:

etuardu@subranu:~/Desktop/pippo$ ./script.sh | sed -r 's/(.*)([0-9]{4})([^\t]+)\t\.\. .*([0-9]{4}).*$/[seq]\1####\3 (\2-\4)/g'[seq]asd1234_####.tar.bz2 (0001-0003)[seq]filename_v003_####.geo (0001-0007)[seq]filename_v003_####.geo (0032-0036)other_filescript.sh[seq]testxxtest.####.exr (0057-0063)etuardu@subranu:~/Desktop/pippo$

Then you can put altogether in a bash script and define an alias in your ~/.bashrc to call it.

As a side note, consider that this is a such pure bash-ish solution that should run on most *nix systems, but the tools used are not really suitable for the task. You may consider to write this script in a language such as python to profit its readability and higher-level string manipulation and pattern matching functions.

python bash shell sequence ls

I got a python 2.7 script that solves your problem by solving the more general problem of collapsing several lines changing only by a sequence number

import redef do_compress(old_ints, ints):    """    whether the ints of the current entry is the continuation of the previous    entry    returns a list of the indexes to compress, or [] or False when the current    line is not part of an indexed sequence    """    return len(old_ints) == len(ints) and \        [i for o, n, i in zip(old_ints, ints, xrange(len(ints))) if n - o == 1]def basic_format(file_start, file_stop):    return "[seq]{} .. {}".format(file_start, file_stop)def compress(files, do_compress=do_compress, seq_format=basic_format):    p = None    old_ints = ()    old_indexes = ()    seq_and_files_list = []         # list of file names or dictionaries that represent sequences:        #   {start, stop, start_f, stop_f}    for f in files:        ints = ()        indexes = ()        m = p is not None and p.match(f) # False, None, or a valid match        if m:            ints = [int(x) for x in m.groups()]            indexes = do_compress(old_ints, ints)        # state variations        if not indexes: # end of sequence or no current sequence            p = re.compile( \                '(\d+)'.join(re.escape(x) for x in re.split('\d+',f)) + '$')            m = p.match(f)            old_ints = [int(x) for x in m.groups()]            old_indexes = ()            seq_and_files_list.append(f)        elif indexes == old_indexes: # the sequence continues            seq_and_files_list[-1]['stop'] = old_ints = ints            seq_and_files_list[-1]['stop_f'] = f            old_indexes = indexes        elif old_indexes == (): # sequence started on previous filename            start_f = seq_and_files_list.pop()            s = {'start': old_ints, 'stop': ints, \                'start_f': start_f, 'stop_f': f}            seq_and_files_list.append(s)            old_ints = ints            old_indexes = indexes        else: # end of sequence, but still matches previous pattern            old_ints = ints            old_indexes = ()            seq_and_files_list.append(f)    return [ isinstance(f, dict) and seq_format(f['start_f'], f['stop_f']) or f         for f in seq_and_files_list ]if __name__ == "__main__":    import sys    if len(sys.argv) == 1:        import os        lst = sorted(os.listdir('.'))    elif sys.argv[1] in ("-h", "--help"):        print """USAGE: {} [FILE ...]compress the listing of the current directory, or the content of the files bycollapsing identical lines, except for a sequence number"""        sys.exit(0)    else:        import string        lst = [string.rstrip(l, '\r\n') for f in sys.argv[1:] for l in open(f)])    for x in compress(lst):        print x

That is, on your data:

bernard $ ./ls_sequence_compression.py given_data[seq]filename_v003_0001.geo .. filename_v003_0007.geo[seq]filename_v003_0032.geo .. filename_v003_0036.geo[seq]testxxtest.0057.exr .. testxxtest.0063.exr

It bases itself on the differences between the integers present in two consecutive lines that match on the non-digit text. This allows to deal with non-uniform input, on changes of the field used as basis for the sequence...

Here is an example of input:

01 - test8.txt01 - test9.txt01 - test10.txt02 - test11.txt02 - test12.txt03 - test13.txt04 - test13.txt05 - test13.txt0607080910

which gives:

[seq]01 - test8.txt .. 01 - test10.txt[seq]02 - test11.txt .. 02 - test12.txt[seq]03 - test13.txt .. 05 - test13.txt[seq]06 .. 10

Any comment is welcome!

Hah... I nearby forgot: without arguments, this script outputs the collapsed contents of the current directory.

CodeHunter

Bash alias to automatically detect arbitrarily named file sequences?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last