Skip to content

Python Iterators and Generators

Iterators and generators are powerful and efficient tools in Python for processing data sequences. They allow you to process data on demand rather than loading all data into memory at once, which is particularly useful when processing large datasets.

Iterators

An iterator is an object that represents a data stream. It implements the iterator protocol, which includes two methods: __iter__() and __next__().

  • Iterable: Any object that can be traversed by a for loop is an iterable, such as lists, tuples, and strings. When you use the iter() function on an iterable, you get an iterator.
  • Iterator: An iterator object is responsible for returning one element at a time in a loop. It uses the next() function to get the next value. When no more data is available, next() raises a StopIteration exception, which the for loop automatically catches and ends the loop.

Internal working principle of for loops:

python
my_list = [1, 2, 3]

# 1. for loop first calls iter() to get the iterator
my_iterator = iter(my_list)

# 2. Then the loop calls next() to get each element
print(next(my_iterator)) # Output: 1
print(next(my_iterator)) # Output: 2
print(next(my_iterator)) # Output: 3

# 3. When elements are exhausted, raises StopIteration
# print(next(my_iterator)) # StopIteration

Iterators are "one-time use". Once you've exhausted an iterator, it's depleted and cannot be used again.

Generators

Generators are a special type of iterator, but you don't need to manually create classes to implement __iter__() and __next__() methods. There are two main ways to create generators:

1. Generator Functions

A generator function is a function that uses the yield keyword. When you call a generator function, it doesn't execute immediately; instead, it returns a generator object (which is also an iterator).

Each time next() is called on a generator, the function executes until it reaches a yield statement. The yield "yields" a value, and then the function pauses execution, preserving its current state (including local variables and instruction pointer). When next() is called again, the function resumes execution from where it left off.

Example:

python
def simple_generator():
    print("Generator started")
    yield 1
    print("Generator resumed")
    yield 2
    print("Generator resumed again")
    yield 3
    print("Generator finished")

# Create generator object
gen = simple_generator()

# Get values one by one
print(next(gen))
# Output:
# Generator started
# 1

print(next(gen))
# Output:
# Generator resumed
# 2

# Can also iterate with for loop
for value in simple_generator():
    print(f"For loop got: {value}")

2. Generator Expressions

Generator expressions provide a more concise way to create generators. Their syntax is similar to list comprehensions, but they use parentheses () instead of square brackets [].

python
# List comprehension: immediately creates a complete list, occupies memory
my_list = [x*x for x in range(5)]
print(my_list) # Output: [0, 1, 4, 9, 16]

# Generator expression: creates a generator object, generates values on demand, saves memory
my_generator = (x*x for x in range(5))
print(my_generator) # Output: <generator object <genexpr> at ...>

# Iterate over generator to get values
for value in my_generator:
    print(value)
# Output: 0, 1, 4, 9, 16

Why Use Generators?

  1. Memory Efficiency: Generators generate values on demand and do not store the entire sequence in memory at once. This makes it possible to process very large (or even infinite) data streams.
  2. Code Simplicity: Generator functions and expressions are much simpler than manually creating iterator classes.
  3. Readability: They clearly express the logic of how data is generated in sequence.

Content is for learning and research only.