How does str.startswith really work? How does str.startswith really work? python-3.x python-3.x

How does str.startswith really work?


There is technically no reason to accept other sequence types, no. The source code roughly does this:

if isinstance(prefix, tuple):    for substring in prefix:        if not isinstance(substring, str):            raise TypeError(...)        return tailmatch(...)elif not isinstance(prefix, str):    raise TypeError(...)return tailmatch(...)

(where tailmatch(...) does the actual matching work).

So yes, any iterable would do for that for loop. But, all the other string test APIs (as well as isinstance() and issubclass()) that take multiple values also only accept tuples, and this tells you as a user of the API that it is safe to assume that the value won't be mutated. You can't mutate a tuple but the method could in theory mutate the list.

Also note that you usually test for a fixed number of prefixes or suffixes or classes (in the case of isinstance() and issubclass()); the implementation is not suited for a large number of elements. A tuple implies that you have a limited number of elements, while lists can be arbitrarily large.

Next, if any iterable or sequence type would be acceptable, then that would include strings; a single string is also a sequence. Should then a single string argument be treated as separate characters, or as a single prefix?

So in other words, it's a limitation to self-document that the sequence won't be mutated, is consistent with other APIs, it carries an implication of a limited number of items to test against, and removes ambiguity as to how a single string argument should be treated.

Note that this was brought up before on the Python Ideas list; see this thread; Guido van Rossum's main argument there is that you either special case for single strings or for only accepting a tuple. He picked the latter and doesn't see a need to change this.


This has already been suggested on Python-ideas a couple of years back see: str.startswith taking any iterator instead of just tuple and GvR had this to say:

The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics.

In addition to that, there seemed to be no real motivation as to why to do this.

The current approach keeps things simple and fast, unicode_startswith (and endswith) check for a tuple argument and then for a string one. They then call tailmatch in the appropriate direction. This is, arguably, very easy to understand in its current state, even for strangers to C code.

Adding other cases will only lead to more bloated and complex code for little benefit while also requiring similar changes to any other parts of the unicode object.


On a similar note, here is an excerpt from a talk by core developer, Raymond Hettinger discussing API design choices regarding certain string methods, including recent changes to the str.startswith signature. While he briefly mentions this fact that str.startswith accepts a string or tuple of strings and does not expound, the talk is informative on the decisions and pain points both core developers and contributors have dealt with leading up to the present API.