Iterables in Python
Python’s collections
module comes with several abstract base classes (ABC) that define interfaces and can be used as type hints starting from Python 3.9 (PEP 585). This post introduces four iterable ABCs, and gives some advice about how to use them in type signatures.
Iterable
Iterable
requires an __iter__
method that returns an Iterator
. That’s it. Here is the CPython code.
class Iterable(metaclass=ABCMeta):
= ()
__slots__
@abstractmethod
def __iter__(self):
while False:
yield None
@classmethod
def __subclasshook__(cls, C):
if cls is Iterable:
return _check_methods(C, "__iter__")
return NotImplemented
= classmethod(GenericAlias) __class_getitem__
If you use a for
loop to iterate over an object, it will implicitly call __iter__
to obtain an Iterator
, on which it repeatedly calls __next__
until a StopIteration
exception is raised (see next section).
__subclasshook__
is used to determine whether C
is a subclass of Iterable
. As you can see, it only checks whether C
has a method __iter__
.
class A:
def __iter__(): ...
issubclass(A, Iterable) # True
The method’s return type Iterator
is a convention that is not enforced technically. __subclasshook__
returns NotImplemented
if called on anything other than Iterable
to prevent inheritance.
__class_getitem__
allows the class to be specialized with type arguments when used as a type hint (PEP 560). It’s a class method by default and doesn’t require the @classmethod
decorator. Thanks to __class_getitem__
, a function that computes the sum of an iterable of integers may have the following signature.
def sum(integers: Iterable[int]) -> int: ...
All the other types discussed here inherit this behavior from Iterable
.
Iterator
To implement an Iterator
you need to provide a __next__
method that returns an item of the collection you’re iterating over or raises a StopIteration
exception when no item is left. For this the iterator needs to keep track of the current iteration state.
An Iterator
is also a special kind of Iterable
whose __iter__
method always returns self
. Thus, once the iterator is exhausted, it becomes useless. Every Iterator
is an Iterable
, but not every Iterable
is an Iterator
.
Generator
A Generator
is a special kind of Iterator
. It’s an object that gives you a next item via __next__
and otherwise just returns itself via __iter__
. Once exhausted, it won’t give you any more items. If you define a generator function with yield
and without return
for the sole purpose of iteration, your intent is communicated most clearly by annotating the return type as Iterator
.
def fib() -> Iterator[int]:
= 0, 1
a, b while True:
yield b
= b, a+b a, b
However, this is not the whole story. Generators can have a separate return type, and since PEP 342, they can be used as coroutines, which is why they require you to implement send
and throw
if you’re not using generator functions. If used this way, you need to annotate the return type as Generator[yield_type, send_type, return_type]
, see PEP 484 (note: importing from typing
is now deprecated, use collections.abc.Generator
instead). This would be a topic for another post, but we can ignore it as long as we only yield and iterate with __next__
.
Sequence
The Sequence
class requires you to implement __getitem__
, which returns the ith item of the sequence, and __len__
, which returns the length of the sequence. If you subclass Sequence
you get __iter__
and some other useful methods for free, and since you get __iter__
, your class will be recognized as an Iterable
as well.
Type hint advice
As a rule of thumb for designing library interfaces, accept the most generic type and return the most specific type possible.
If your method or function needs some input to iterate over at most once, use Iterable
. If you need iteration and random access, use Sequence
. If you need to iterate over the same data multiple times, you’re out of luck with the ABCs of the collections module: an Iterable
might give you a new iterator each time you call __iter__
or not, no guarantees; a Sequence
is guaranteed to give you a new iterator over all the data when calling __iter__
(as long as nobody overrode the default implementation), but it requires a __getitem__
method, which is unnecessarily restrictive. Consider accepting a concrete type or a union of concrete types like List
or Tuple
that can be iterated over repeatedly.
You can use Iterator
as the return type for simple generator functions, as mentioned above, but otherwise you’ll often want to type-hint the concrete class that is returned. (Possible exception: you offer a stable public API and can’t afford to be nailed down on a specific type.)
What if you don’t follow this advice? Say, your methods only accept lists instead of iterables. Users of your code that have tuples or iterators need to convert them to lists everytime they interact with your API.1 And if you annotate Iterable
as return type, but actually always return lists? People can’t rely on any list properties like random access or mutability. You need to decide whether placing such restrictions on users of your code is necessary.
Or sprinkle the code with
type: ignore
to silence the linter and hope for the best.↩︎