Python Iterators and Generators
Iterators and generators are powerful and efficient tools in Python for processing data sequences. They allow you to process data on demand rather than loading all data into memory at once, which is particularly useful when processing large datasets.
Iterators
An iterator is an object that represents a data stream. It implements the iterator protocol, which includes two methods: __iter__() and __next__().
- Iterable: Any object that can be traversed by a
forloop is an iterable, such as lists, tuples, and strings. When you use theiter()function on an iterable, you get an iterator. - Iterator: An iterator object is responsible for returning one element at a time in a loop. It uses the
next()function to get the next value. When no more data is available,next()raises aStopIterationexception, which theforloop automatically catches and ends the loop.
Internal working principle of for loops:
Iterators are "one-time use". Once you've exhausted an iterator, it's depleted and cannot be used again.
Generators
Generators are a special type of iterator, but you don't need to manually create classes to implement __iter__() and __next__() methods. There are two main ways to create generators:
1. Generator Functions
A generator function is a function that uses the yield keyword. When you call a generator function, it doesn't execute immediately; instead, it returns a generator object (which is also an iterator).
Each time next() is called on a generator, the function executes until it reaches a yield statement. The yield "yields" a value, and then the function pauses execution, preserving its current state (including local variables and instruction pointer). When next() is called again, the function resumes execution from where it left off.
Example:
2. Generator Expressions
Generator expressions provide a more concise way to create generators. Their syntax is similar to list comprehensions, but they use parentheses () instead of square brackets [].
Why Use Generators?
- Memory Efficiency: Generators generate values on demand and do not store the entire sequence in memory at once. This makes it possible to process very large (or even infinite) data streams.
- Code Simplicity: Generator functions and expressions are much simpler than manually creating iterator classes.
- Readability: They clearly express the logic of how data is generated in sequence.