Understanding generators in Python Understanding generators in Python python python

Understanding generators in Python


Note: this post assumes Python 3.x syntax.

A generator is simply a function which returns an object on which you can call next, such that for every call it returns some value, until it raises a StopIteration exception, signaling that all values have been generated. Such an object is called an iterator.

Normal functions return a single value using return, just like in Java. In Python, however, there is an alternative, called yield. Using yield anywhere in a function makes it a generator. Observe this code:

>>> def myGen(n):...     yield n...     yield n + 1... >>> g = myGen(6)>>> next(g)6>>> next(g)7>>> next(g)Traceback (most recent call last):  File "<stdin>", line 1, in <module>StopIteration

As you can see, myGen(n) is a function which yields n and n + 1. Every call to next yields a single value, until all values have been yielded. for loops call next in the background, thus:

>>> for n in myGen(6):...     print(n)... 67

Likewise there are generator expressions, which provide a means to succinctly describe certain common types of generators:

>>> g = (n for n in range(3, 5))>>> next(g)3>>> next(g)4>>> next(g)Traceback (most recent call last):  File "<stdin>", line 1, in <module>StopIteration

Note that generator expressions are much like list comprehensions:

>>> lc = [n for n in range(3, 5)]>>> lc[3, 4]

Observe that a generator object is generated once, but its code is not run all at once. Only calls to next actually execute (part of) the code. Execution of the code in a generator stops once a yield statement has been reached, upon which it returns a value. The next call to next then causes execution to continue in the state in which the generator was left after the last yield. This is a fundamental difference with regular functions: those always start execution at the "top" and discard their state upon returning a value.

There are more things to be said about this subject. It is e.g. possible to send data back into a generator (reference). But that is something I suggest you do not look into until you understand the basic concept of a generator.

Now you may ask: why use generators? There are a couple of good reasons:

  • Certain concepts can be described much more succinctly using generators.
  • Instead of creating a function which returns a list of values, one can write a generator which generates the values on the fly. This means that no list needs to be constructed, meaning that the resulting code is more memory efficient. In this way one can even describe data streams which would simply be too large to fit in memory.
  • Generators allow for a natural way to describe infinite streams. Consider for example the Fibonacci numbers:

    >>> def fib():...     a, b = 0, 1...     while True:...         yield a...         a, b = b, a + b... >>> import itertools>>> list(itertools.islice(fib(), 10))[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

    This code uses itertools.islice to take a finite number of elements from an infinite stream. You are advised to have a good look at the functions in the itertools module, as they are essential tools for writing advanced generators with great ease.


   About Python <=2.6: in the above examples next is a function which calls the method __next__ on the given object. In Python <=2.6 one uses a slightly different technique, namely o.next() instead of next(o). Python 2.7 has next() call .next so you need not use the following in 2.7:

>>> g = (n for n in range(3, 5))>>> g.next()3


A generator is effectively a function that returns (data) before it is finished, but it pauses at that point, and you can resume the function at that point.

>>> def myGenerator():...     yield 'These'...     yield 'words'...     yield 'come'...     yield 'one'...     yield 'at'...     yield 'a'...     yield 'time'>>> myGeneratorInstance = myGenerator()>>> next(myGeneratorInstance)These>>> next(myGeneratorInstance)words

and so on. The (or one) benefit of generators is that because they deal with data one piece at a time, you can deal with large amounts of data; with lists, excessive memory requirements could become a problem. Generators, just like lists, are iterable, so they can be used in the same ways:

>>> for word in myGeneratorInstance:...     print wordThesewordscomeoneat a time

Note that generators provide another way to deal with infinity, for example

>>> from time import gmtime, strftime>>> def myGen():...     while True:...         yield strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())    >>> myGeneratorInstance = myGen()>>> next(myGeneratorInstance)Thu, 28 Jun 2001 14:17:15 +0000>>> next(myGeneratorInstance)Thu, 28 Jun 2001 14:18:02 +0000   

The generator encapsulates an infinite loop, but this isn't a problem because you only get each answer every time you ask for it.


First of all, the term generator originally was somewhat ill-defined in Python, leading to lots of confusion. You probably mean iterators and iterables (see here). Then in Python there are also generator functions (which return a generator object), generator objects (which are iterators) and generator expressions (which are evaluated to a generator object).

According to the glossary entry for generator it seems that the official terminology is now that generator is short for "generator function". In the past the documentation defined the terms inconsistently, but fortunately this has been fixed.

It might still be a good idea to be precise and avoid the term "generator" without further specification.