Finding patterns in list
The Code (updated for Python 2 + 3)
Ignoring the "no overlapping" requirement, here's the code I used:
import collections def pattern(seq): storage = {} for length in range(1,int(len(seq)/2)+1): valid_strings = {} for start in range(0,len(seq)-length+1): valid_strings[start] = tuple(seq[start:start+length]) candidates = set(valid_strings.values()) if len(candidates) != len(valid_strings): print("Pattern found for " + str(length)) storage = valid_strings else: print("No pattern found for " + str(length)) break return set(v for v in storage.values() if list(storage.values()).count(v) > 1)
Using that, I found 8 distinct patterns of length 303 in your dataset. The program ran pretty fast, too.
Pseudocode Version
define patterns(sequence): list_of_substrings = {} for each valid length: ### i.e. lengths from 1 to half the list's length generate a dictionary my_dict of all sub-lists of size length if there are repeats: list_of_substrings = my_dict else: return all repeated values in list_of_substrings return list_of_substrings #### returns {} when there are no patterns
I have an answer.It works.(without overlapping) but it is for python3
def get_pattern(seq): seq2=seq outs={} l=0 r=0 c=None for end in range(len(seq2)+1): for start in range(end): word=chr(1).join(seq2[start:end]) if not word in outs: outs[word]=1 else: outs[word]+=1 for item in outs: if outs[item]>r or (len(item)>l and outs[item]>1): l=len(item) r=outs[item] c=item return c.split(chr(1))