Bash alias to automatically detect arbitrarily named file sequences? Bash alias to automatically detect arbitrarily named file sequences? shell shell

Bash alias to automatically detect arbitrarily named file sequences?


This is one way of doing something like that with awk. Code is pretty unreadable though:

#!/bin/bashls | awk 'function smprint() {    if ((a[1]!=exA1) || (a[2] != exA2+1)) {        if ((exA1) && (exA1==exexA1)) print "\t.. " exfile;        else printf linesep;        if ($0!=exfile) printf $0;    }};BEGIN { d="[0-9]"; rg="(.*)(" d d d d ")(.*)"; };{    split(gensub(rg, "\\1####\\3\t\\2", "g"), a, "\t");    # produces e.g.: a[1]="file####.ext" a[2]="0001"    smprint();    linesep="\n";    exexA1=exA1; # old old a[1]    exA1=a[1]; # old a[1]    exA2=a[2]; # old a[2]    exfile=$0; # old filename};END {    smprint();}'

Comparing the output of ls and the script above on the same folder:

etuardu@subranu:~/Desktop/pippo$ lsasd1234_0001.tar.bz2    filename_v003_0006.geo  script.shasd1234_0002.tar.bz2    filename_v003_0007.geo  testxxtest.0057.exrasd1234_0003.tar.bz2    filename_v003_0032.geo  testxxtest.0058.exrfilename_v003_0001.geo  filename_v003_0033.geo  testxxtest.0059.exrfilename_v003_0002.geo  filename_v003_0034.geo  testxxtest.0060.exrfilename_v003_0003.geo  filename_v003_0035.geo  testxxtest.0061.exrfilename_v003_0004.geo  filename_v003_0036.geo  testxxtest.0062.exrfilename_v003_0005.geo  other_file              testxxtest.0063.exretuardu@subranu:~/Desktop/pippo$ ./script.sh asd1234_0001.tar.bz2    .. asd1234_0003.tar.bz2filename_v003_0001.geo  .. filename_v003_0007.geofilename_v003_0032.geo  .. filename_v003_0036.geoother_filescript.shtestxxtest.0057.exr .. testxxtest.0063.exretuardu@subranu:~/Desktop/pippo$ 

If you mind to stick to the syntax you provided in the example, you can pipe this output to sed. With some regex magic you have:

etuardu@subranu:~/Desktop/pippo$ ./script.sh | sed -r 's/(.*)([0-9]{4})([^\t]+)\t\.\. .*([0-9]{4}).*$/[seq]\1####\3 (\2-\4)/g'[seq]asd1234_####.tar.bz2 (0001-0003)[seq]filename_v003_####.geo (0001-0007)[seq]filename_v003_####.geo (0032-0036)other_filescript.sh[seq]testxxtest.####.exr (0057-0063)etuardu@subranu:~/Desktop/pippo$

Then you can put altogether in a bash script and define an alias in your ~/.bashrc to call it.

As a side note, consider that this is a such pure bash-ish solution that should run on most *nix systems, but the tools used are not really suitable for the task. You may consider to write this script in a language such as python to profit its readability and higher-level string manipulation and pattern matching functions.


I got a python 2.7 script that solves your problem by solving the more general problem of collapsing several lines changing only by a sequence number

import redef do_compress(old_ints, ints):    """    whether the ints of the current entry is the continuation of the previous    entry    returns a list of the indexes to compress, or [] or False when the current    line is not part of an indexed sequence    """    return len(old_ints) == len(ints) and \        [i for o, n, i in zip(old_ints, ints, xrange(len(ints))) if n - o == 1]def basic_format(file_start, file_stop):    return "[seq]{} .. {}".format(file_start, file_stop)def compress(files, do_compress=do_compress, seq_format=basic_format):    p = None    old_ints = ()    old_indexes = ()    seq_and_files_list = []         # list of file names or dictionaries that represent sequences:        #   {start, stop, start_f, stop_f}    for f in files:        ints = ()        indexes = ()        m = p is not None and p.match(f) # False, None, or a valid match        if m:            ints = [int(x) for x in m.groups()]            indexes = do_compress(old_ints, ints)        # state variations        if not indexes: # end of sequence or no current sequence            p = re.compile( \                '(\d+)'.join(re.escape(x) for x in re.split('\d+',f)) + '$')            m = p.match(f)            old_ints = [int(x) for x in m.groups()]            old_indexes = ()            seq_and_files_list.append(f)        elif indexes == old_indexes: # the sequence continues            seq_and_files_list[-1]['stop'] = old_ints = ints            seq_and_files_list[-1]['stop_f'] = f            old_indexes = indexes        elif old_indexes == (): # sequence started on previous filename            start_f = seq_and_files_list.pop()            s = {'start': old_ints, 'stop': ints, \                'start_f': start_f, 'stop_f': f}            seq_and_files_list.append(s)            old_ints = ints            old_indexes = indexes        else: # end of sequence, but still matches previous pattern            old_ints = ints            old_indexes = ()            seq_and_files_list.append(f)    return [ isinstance(f, dict) and seq_format(f['start_f'], f['stop_f']) or f         for f in seq_and_files_list ]if __name__ == "__main__":    import sys    if len(sys.argv) == 1:        import os        lst = sorted(os.listdir('.'))    elif sys.argv[1] in ("-h", "--help"):        print """USAGE: {} [FILE ...]compress the listing of the current directory, or the content of the files bycollapsing identical lines, except for a sequence number"""        sys.exit(0)    else:        import string        lst = [string.rstrip(l, '\r\n') for f in sys.argv[1:] for l in open(f)])    for x in compress(lst):        print x

That is, on your data:

bernard $ ./ls_sequence_compression.py given_data[seq]filename_v003_0001.geo .. filename_v003_0007.geo[seq]filename_v003_0032.geo .. filename_v003_0036.geo[seq]testxxtest.0057.exr .. testxxtest.0063.exr

It bases itself on the differences between the integers present in two consecutive lines that match on the non-digit text. This allows to deal with non-uniform input, on changes of the field used as basis for the sequence...

Here is an example of input:

01 - test8.txt01 - test9.txt01 - test10.txt02 - test11.txt02 - test12.txt03 - test13.txt04 - test13.txt05 - test13.txt0607080910

which gives:

[seq]01 - test8.txt .. 01 - test10.txt[seq]02 - test11.txt .. 02 - test12.txt[seq]03 - test13.txt .. 05 - test13.txt[seq]06 .. 10

Any comment is welcome!

Hah... I nearby forgot: without arguments, this script outputs the collapsed contents of the current directory.