zeitbach.com

Iterables in Python

Python’s collections module comes with several abstract base classes (ABC) that define interfaces and can be used as type hints starting from Python 3.9 (PEP 585). This post introduces four iterable ABCs, and gives some advice about how to use them in type signatures.

Iterable

Iterable requires an __iter__ method that returns an Iterator. That’s it. Here is the CPython code.

class Iterable(metaclass=ABCMeta):
    __slots__ = ()

    @abstractmethod
    def __iter__(self):
        while False:
            yield None

    @classmethod
    def __subclasshook__(cls, C):
        if cls is Iterable:
            return _check_methods(C, "__iter__")
        return NotImplemented

    __class_getitem__ = classmethod(GenericAlias)

If you use a for loop to iterate over an object, it will implicitly call __iter__ to obtain an Iterator, on which it repeatedly calls __next__ until a StopIteration exception is raised (see next section).

__subclasshook__ is used to determine whether C is a subclass of Iterable. As you can see, it only checks whether C has a method __iter__.

class A:
    def __iter__(): ...

issubclass(A, Iterable) # True

The method’s return type Iterator is a convention that is not enforced technically. __subclasshook__ returns NotImplemented if called on anything other than Iterable to prevent inheritance.

__class_getitem__ allows the class to be specialized with type arguments when used as a type hint (PEP 560). It’s a class method by default and doesn’t require the @classmethod decorator. Thanks to __class_getitem__, a function that computes the sum of an iterable of integers may have the following signature.

def sum(integers: Iterable[int]) -> int: ...

All the other types discussed here inherit this behavior from Iterable.

Iterator

To implement an Iterator you need to provide a __next__ method that returns an item of the collection you’re iterating over or raises a StopIteration exception when no item is left. For this the iterator needs to keep track of the current iteration state.

An Iterator is also a special kind of Iterable whose __iter__ method always returns self. Thus, once the iterator is exhausted, it becomes useless. Every Iterator is an Iterable, but not every Iterable is an Iterator.

Generator

A Generator is a special kind of Iterator. It’s an object that gives you a next item via __next__ and otherwise just returns itself via __iter__. Once exhausted, it won’t give you any more items. If you define a generator function with yield and without return for the sole purpose of iteration, your intent is communicated most clearly by annotating the return type as Iterator.

def fib() -> Iterator[int]:
    a, b = 0, 1
    while True:
       yield b
       a, b = b, a+b

However, this is not the whole story. Generators can have a separate return type, and since PEP 342, they can be used as coroutines, which is why they require you to implement send and throw if you’re not using generator functions. If used this way, you need to annotate the return type as Generator[yield_type, send_type, return_type], see PEP 484 (note: importing from typing is now deprecated, use collections.abc.Generator instead). This would be a topic for another post, but we can ignore it as long as we only yield and iterate with __next__.

Sequence

The Sequence class requires you to implement __getitem__, which returns the ith item of the sequence, and __len__, which returns the length of the sequence. If you subclass Sequence you get __iter__ and some other useful methods for free, and since you get __iter__, your class will be recognized as an Iterable as well.

Type hint advice

As a rule of thumb for designing library interfaces, accept the most generic type and return the most specific type possible.

If your method or function needs some input to iterate over at most once, use Iterable. If you need iteration and random access, use Sequence. If you need to iterate over the same data multiple times, you’re out of luck with the ABCs of the collections module: an Iterable might give you a new iterator each time you call __iter__ or not, no guarantees; a Sequence is guaranteed to give you a new iterator over all the data when calling __iter__ (as long as nobody overrode the default implementation), but it requires a __getitem__ method, which is unnecessarily restrictive. Consider accepting a concrete type or a union of concrete types like List or Tuple that can be iterated over repeatedly.

You can use Iterator as the return type for simple generator functions, as mentioned above, but otherwise you’ll often want to type-hint the concrete class that is returned. (Possible exception: you offer a stable public API and can’t afford to be nailed down on a specific type.)

What if you don’t follow this advice? Say, your methods only accept lists instead of iterables. Users of your code that have tuples or iterators need to convert them to lists everytime they interact with your API.1 And if you annotate Iterable as return type, but actually always return lists? People can’t rely on any list properties like random access or mutability. You need to decide whether placing such restrictions on users of your code is necessary.


  1. Or sprinkle the code with type: ignore to silence the linter and hope for the best.↩︎