How can I tell if a string repeats itself in Python?
Here's a concise solution which avoids regular expressions and slow in-Python loops:
def principal_period(s): i = (s+s).find(s, 1, -1) return None if i == -1 else s[:i]
See the Community Wiki answer started by @davidism for benchmark results. In summary,
David Zhang's solution is the clear winner, outperforming all others by at least 5x for the large example set.
(That answer's words, not mine.)
This is based on the observation that a string is periodic if and only if it is equal to a nontrivial rotation of itself. Kudos to @AleksiTorhamo for realizing that we can then recover the principal period from the index of the first occurrence of
(s+s)[1:-1], and for informing me of the optional
end arguments of Python's
Here's a solution using regular expressions.
import reREPEATER = re.compile(r"(.+?)\1+$")def repeated(s): match = REPEATER.match(s) return match.group(1) if match else None
Iterating over the examples in the question:
examples = [ '0045662100456621004566210045662100456621', '0072992700729927007299270072992700729927', '001443001443001443001443001443001443001443', '037037037037037037037037037037037037037037037', '047619047619047619047619047619047619047619', '002457002457002457002457002457002457002457', '001221001221001221001221001221001221001221', '001230012300123001230012300123001230012300123', '0013947001394700139470013947001394700139470013947', '001001001001001001001001001001001001001001001001001', '001406469760900140646976090014064697609', '004608294930875576036866359447', '00469483568075117370892018779342723', '004739336492890995260663507109', '001508295625942684766214177978883861236802413273', '007518796992481203', '0071942446043165467625899280575539568345323741', '0434782608695652173913', '0344827586206896551724137931', '002481389578163771712158808933', '002932551319648093841642228739', '0035587188612099644128113879', '003484320557491289198606271777', '00115074798619102416570771',]for e in examples: sub = repeated(e) if sub: print("%r: %r" % (e, sub)) else: print("%r does not repeat." % e)
... produces this output:
'0045662100456621004566210045662100456621': '00456621''0072992700729927007299270072992700729927': '00729927''001443001443001443001443001443001443001443': '001443''037037037037037037037037037037037037037037037': '037''047619047619047619047619047619047619047619': '047619''002457002457002457002457002457002457002457': '002457''001221001221001221001221001221001221001221': '001221''001230012300123001230012300123001230012300123': '00123''0013947001394700139470013947001394700139470013947': '0013947''001001001001001001001001001001001001001001001001001': '001''001406469760900140646976090014064697609': '0014064697609''004608294930875576036866359447' does not repeat.'00469483568075117370892018779342723' does not repeat.'004739336492890995260663507109' does not repeat.'001508295625942684766214177978883861236802413273' does not repeat.'007518796992481203' does not repeat.'0071942446043165467625899280575539568345323741' does not repeat.'0434782608695652173913' does not repeat.'0344827586206896551724137931' does not repeat.'002481389578163771712158808933' does not repeat.'002932551319648093841642228739' does not repeat.'0035587188612099644128113879' does not repeat.'003484320557491289198606271777' does not repeat.'00115074798619102416570771' does not repeat.
The regular expression
(.+?)\1+$ is divided into three parts:
(.+?)is a matching group containing at least one (but as few as possible) of any character (because
\1+checks for at least one repetition of the matching group in the first part.
$checks for the end of the string, to ensure that there's no extra, non-repeating content after the repeated substrings (and using
re.match()ensures that there's no non-repeating text before the repeated substrings).
In Python 3.4 and later, you could drop the
$ and use
re.fullmatch() instead, or (in any Python at least as far back as 2.3) go the other way and use
re.search() with the regex
^(.+?)\1+$, all of which are more down to personal taste than anything else.
You can make the observation that for a string to be considered repeating, its length must be divisible by the length of its repeated sequence. Given that, here is a solution that generates divisors of the length from
n / 2 inclusive, divides the original string into substrings with the length of the divisors, and tests the equality of the result set:
from math import sqrt, floordef divquot(n): if n > 1: yield 1, n swapped =  for d in range(2, int(floor(sqrt(n))) + 1): q, r = divmod(n, d) if r == 0: yield d, q swapped.append((q, d)) while swapped: yield swapped.pop()def repeats(s): n = len(s) for d, q in divquot(n): sl = s[0:d] if sl * q == s: return sl return None
EDIT: In Python 3, the
/ operator has changed to do float division by default. To get the
int division from Python 2, you can use the
// operator instead. Thank you to @TigerhawkT3 for bringing this to my attention.
// operator performs integer division in both Python 2 and Python 3, so I've updated the answer to support both versions. The part where we test to see if all the substrings are equal is now a short-circuiting operation using
all and a generator expression.
UPDATE: In response to a change in the original question, the code has now been updated to return the smallest repeating substring if it exists and
None if it does not. @godlygeek has suggested using
divmod to reduce the number of iterations on the
divisors generator, and the code has been updated to match that as well. It now returns all positive divisors of
n in ascending order, exclusive of
Further update for high performance: After multiple tests, I've come to the conclusion that simply testing for string equality has the best performance out of any slicing or iterator solution in Python. Thus, I've taken a leaf out of @TigerhawkT3 's book and updated my solution. It's now over 6x as fast as before, noticably faster than Tigerhawk's solution but slower than David's.