Python Iterators and Generators
Iterators and generators are powerful and efficient tools in Python for processing data sequences. They allow you to process data on demand rather than loading all data into memory at once, which is particularly useful when processing large datasets.
Iterators
An iterator is an object that represents a data stream. It implements the iterator protocol, which includes two methods: __iter__() and __next__().
- Iterable: Any object that can be traversed by a
forloop is an iterable, such as lists, tuples, and strings. When you use theiter()function on an iterable, you get an iterator. - Iterator: An iterator object is responsible for returning one element at a time in a loop. It uses the
next()function to get the next value. When no more data is available,next()raises aStopIterationexception, which theforloop automatically catches and ends the loop.
Internal working principle of for loops:
my_list = [1, 2, 3]
# 1. for loop first calls iter() to get the iterator
my_iterator = iter(my_list)
# 2. Then the loop calls next() to get each element
print(next(my_iterator)) # Output: 1
print(next(my_iterator)) # Output: 2
print(next(my_iterator)) # Output: 3
# 3. When elements are exhausted, raises StopIteration
# print(next(my_iterator)) # StopIterationIterators are "one-time use". Once you've exhausted an iterator, it's depleted and cannot be used again.
Generators
Generators are a special type of iterator, but you don't need to manually create classes to implement __iter__() and __next__() methods. There are two main ways to create generators:
1. Generator Functions
A generator function is a function that uses the yield keyword. When you call a generator function, it doesn't execute immediately; instead, it returns a generator object (which is also an iterator).
Each time next() is called on a generator, the function executes until it reaches a yield statement. The yield "yields" a value, and then the function pauses execution, preserving its current state (including local variables and instruction pointer). When next() is called again, the function resumes execution from where it left off.
Example:
def simple_generator():
print("Generator started")
yield 1
print("Generator resumed")
yield 2
print("Generator resumed again")
yield 3
print("Generator finished")
# Create generator object
gen = simple_generator()
# Get values one by one
print(next(gen))
# Output:
# Generator started
# 1
print(next(gen))
# Output:
# Generator resumed
# 2
# Can also iterate with for loop
for value in simple_generator():
print(f"For loop got: {value}")2. Generator Expressions
Generator expressions provide a more concise way to create generators. Their syntax is similar to list comprehensions, but they use parentheses () instead of square brackets [].
# List comprehension: immediately creates a complete list, occupies memory
my_list = [x*x for x in range(5)]
print(my_list) # Output: [0, 1, 4, 9, 16]
# Generator expression: creates a generator object, generates values on demand, saves memory
my_generator = (x*x for x in range(5))
print(my_generator) # Output: <generator object <genexpr> at ...>
# Iterate over generator to get values
for value in my_generator:
print(value)
# Output: 0, 1, 4, 9, 16Why Use Generators?
- Memory Efficiency: Generators generate values on demand and do not store the entire sequence in memory at once. This makes it possible to process very large (or even infinite) data streams.
- Code Simplicity: Generator functions and expressions are much simpler than manually creating iterator classes.
- Readability: They clearly express the logic of how data is generated in sequence.