Iterables and Generators

Iterators

Iterators are objects that produce successive items or values from an associated iterable. They:

  • Hold the state (position) of the iteration
  • Allow looping just once and must be reinitialized to loop again
  • Implement the __next__ method that…
    • returns the next item in the sequence
    • raises the StopIteration exception if there is nothing to return
    • can also be invoked using the next(iterable) function

An iterable is an object that can be iterated over.

  • Must be capable of returning an iterator
  • Must implement the __iter__ method, callable using the iter function

Simple iterables and iterators examples

Lets define an iterable and an iterator

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Iterable:
    def __iter__(self):
        """
        Called by iter(Iterable())
        """ 
        return Iterator()

class Iterator:
    def __init__(self):
        self.x = -1
        
    def __next__(self):
        """
        Called by next(iterator)
        """ 
        self.x += 1
        return self.x

… and instantiate them.

1
2
3
iterable = Iterable()

print(iterable)
<__main__.Iterable object at 0x7f474c781160>
1
2
3
4
iterator = iter(iterable)
# iter(iterable) ==> iterable.__iter__()

print(iterator)
<__main__.Iterator object at 0x7f474c7814e0>

Let’s call the next() function on the iterator.

1
2
# iterator.__next__() 
print(next(iterator))
0

An object can also define both next and iter methods.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class SimpleIterable:
    def __iter__(self):
        self.x = -1
        return self

#     def __next__(self):
#         self.x += 1
#         return self.x
    
    def __next__(self):
        if self.x <= 3:
            self.x += 1
        else:
            raise StopIteration
        return self.x
1
2
3
iterable = SimpleIterable()
print(type(iterable))
# iterable.x
<class '__main__.SimpleIterable'>
1
2
3
iterator = iter(iterable)
type(iterator)
# iterable.x
__main__.SimpleIterable
1
2
# Call next 5 times
next(iterator)
0

How to iterate on iterators

Iterators from containers (e.g. lists)

1
2
a = list(range(4))
print(a)
[0, 1, 2, 3]

Calling next() on a list won’t work.

1
next(a)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-10-15841f3f11d4> in <module>
----> 1 next(a)


TypeError: 'list' object is not an iterator

But getting an iterable from a list and iterating on it will.

1
2
3
4
a = iter(a)
# iter(iter) = iter

next(a)
0

The foreach construct

  • Built-in in the language with the forin construct
  • It allows looping on all elements of an iterable.
  • Automatically calls the iter(…) function before starting looping
1
2
3
4
print("With iter()")
iterator = iter(iterable)
for item in iterator:
    print(item)
With iter()
0
1
2
3
4
1
2
3
print("Without iter()")
for item in SimpleIterable():
    print(item)
Without iter()
0
1
2
3
4

Generators

  • Functions containing the keyword yield

  • yield :

    • works similarly to return and returns an object when called…
    • … but state of the function is saved
  • When next() is called again on the generator function, execution resumes where it was left off

  • Note that generators do not return values when initialized.

Examples

Trivial generator

1
2
3
4
5
6
7
8
def f():
  print("-- start --")
  yield 3

  print("-- middle --")
  yield 4

  print("-- finished --")
1
2
generator = f()
generator
<generator object f at 0x7f47500215e8>
1
next(generator)
-- start --
3

Counter

1
2
3
4
5
6
7
8
9
def counter():
    x : int = 0
        
    while True:
        yield x
        x += 1
        
generator = counter()
generator
<generator object counter at 0x7f4750021930>
1
next(generator)
0

Generators can also be defined inline!

1
2
3
4
generator = (x for x in range(10))
print("generator type:", type(generator))

list(generator)
generator type: <class 'generator'>

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Mind the difference with list comprehensions!

1
2
not_a_generator = [x for x in range(10)]
print("not_a_generator type:", type(not_a_generator))
not_a_generator type: <class 'list'>

Generators are lazy iterators

  • They are used to generate values dynamically
    • Very useful to cope, for instance, with Out of Memory issues
  • As iterators, they don’t implement the __len__ method
    • i.e., len() function will cause an exception
  • Generators support bidirectional communication.
    • You can pass values to the generator after its initialization
  • Concurrent and recursive invocations are allowed
    • … even though they are not thread safe out of the box.

Dynamic value generation example

1
2
3
4
5
6
7
8
9
from datetime import datetime
print("MS output format:", datetime.now().microsecond)

def very_unsafe_prng(max_value):    
    while True:
        yield datetime.now().microsecond % max_value
        
generator = very_unsafe_prng(10)
generator
MS output format: 445995

<generator object very_unsafe_prng at 0x7f4750021750>
1
2
# No len!
len(generator)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-22-95dd6a62a607> in <module>
      1 # No len!
----> 2 len(generator)


TypeError: object of type 'generator' has no len()
1
next(generator)
1

Bidirectional communication

Bidirectional communication allows to send values to the generator. Relies on three methods:

  • .send(…): sends the value to the generator and, like next(), returns the next value
  • .throw(…): throws the passed exception after resuming the generator that will handle it
  • .close(): stops the generator. Equivalent to .throw(GeneratorExit())
  • yield can be used in expressions to assign values to generator’s variables
    • Values will be assigned when the generator resumes from yield

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from random import choice

# Define allowed values ([1, 7])
values = list(range(1, 8))

# Define a generator
def seven_and_half():
    values_sum = 0
    results = []

    player = 1
    
    # Keep going until generator is closed
    while True:
        # Pick a card (pseudo) randomly
        value = choice(values)

        # Accumulate the values
        values_sum += value
        
        # 
        try:
            response = yield value, values_sum
        except GeneratorExit:
            results.append((player, values_sum),)
            
            for player, score in results:
                print("Player {} scored {}".format(player, score))
            break
        
        if response is False or response is None:            
            results.append((player, values_sum),)
            values_sum = 0
            player += 1
            
    print("Exiting")
1
2
# Init the generator
generator = seven_and_half()
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
keep_playing = None

# Keep in mind that
# generator.send(None) === next(generator)
    
while True:  
    if keep_playing is False:
        break
        
    print("Picking a card.")
    value, values_sum = generator.send(keep_playing)
        
    print("Picked {}. Total: {}".format(value, values_sum))
    
    if values_sum > 7:
        print("You lost!")
        keep_playing = False    
    else:
        keep_playing : bool = (input("Keep picking? ") in ["y", "Y", True] )
    # print(output)
Picking a card.
Picked 5. Total: 5
Keep picking? n
1
generator.close()
Player 1 scored 5
Exiting

Exercise

Intro

CSV (Comma-Separated Values) files are text files where each row is a data record and columns are separated by commas (or some other character).

The first row is (usually) the header (i.e., the name of the corresponding column).

A CSV file looks like:

id,name,surname
0,Mickey,Mouse

Request

You have to process a CSV file. Lets assume it is too large to fit in RAM.

You should process it in small pieces, e.g., by reading each line sequentially using a generator.

Specifically, after reading the header, for each line of the file create a dictionary with the column-value associations.

A few tips

  • Pathlib module offers classes representing filesystem paths with semantics appropriate for different operating systems
  • Use the zip(*iterables) builtin function. From the docs:
    • Builds an iterator that aggregates elements from each of the iterables
    • That is, it returns an iterator of tuples, i.e., The i-th element of the tuple contains the i-th element from each of the argument sequences or iterables.
    • The iterator stops when the shortest input iterable is exhausted
  • Use the with statement. From the docs: the with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common try…except…finally usage patterns to be encapsulated for convenient reuse.
  • As an example CSV, download as raw the Rolling Stone Magazine’s list of “The 500 Greatest Albums of All Time.” from GitHub.

Solution

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from pathlib2 import Path


def dataset_reader(file):
    # Lets use pathlib instead of using the open() function,
    # with open(file, "r+") as f:

    # Creating a Path instance.
    file = Path(file)

    # Not actually needed, just showing some functionality
    if not file.absolute():
        file = file.resolve()
        
    print("Opening", file.name, "in folder", file.parent)
    
    if not file.exists():
        raise FileNotFoundError("File doesn't exist!")
        
    
    with file.open("r+", encoding="ISO-8859-15") as f:
        header = f.readline()
        columns = header.strip().split(',')
        
        print("Found columns:", columns)
        for line in f:
            values = line.strip().split(',')
#             print(values)
            
            try: 
                yield dict(zip(columns, values))
            except GeneratorExit:
                print("Closing the generator!")
                break
1
2
3
file = "albumlist.csv"

generator = dataset_reader(file)
1
next(generator)
Opening albumlist.csv in folder .
Found columns: ['Number', 'Year', 'Album', 'Artist', 'Genre', 'Subgenre']

{'Number': '1',
 'Year': '1967',
 'Album': "Sgt. Pepper's Lonely Hearts Club Band",
 'Artist': 'The Beatles',
 'Genre': 'Rock',
 'Subgenre': '"Rock & Roll'}

References

Previous
Next