# Iterators and Generators

## Iterators
_**Iterators**_ are objects that produce successive items or values from an associated **_iterable_**.
They:
- Hold the state (position) of the iteration
- Allow looping just once and must be reinitialized to loop again
- Implement the ***\_\_next\_\_*** method that...
    - returns the next item in the sequence
    - raises the ***StopIteration*** exception if there is nothing to return
    - can also be invoked using the ***next(iterable)*** function

An **_iterable_** is an object that can be iterated over.
- Must be capable of returning an iterator
- Must implement the ***\_\_iter\_\_*** method, callable using the *iter* function

#### Simple iterables and iterators examples 

Lets define an _iterable_ and an _iterator_...

In [None]:
class Iterable:
    def __iter__(self):
        """
        Called by iter(Iterable())
        """ 
        return Iterator()

class Iterator:
    def __init__(self):
        self.x = -1
        
    def __next__(self):
        """
        Called by next(iterator)
        """ 
        self.x += 1
        return self.x

... and instantiate them.

In [None]:
iterable = Iterable()

print(iterable)

In [None]:
iterator = iter(iterable)
# iter(iterable) ==> iterable.__iter__()

print(iterator)

Let's call the _next()_ function on the iterator.

In [None]:
# iterator.__next__() 
print(next(iterator))

An object can also define both _next_ and _iter_ methods.

In [None]:
class SimpleIterable:
    def __iter__(self):
        self.x = -1
        return self

#     def __next__(self):
#         self.x += 1
#         return self.x
    
    def __next__(self):
        if self.x <= 3:
            self.x += 1
        else:
            raise StopIteration
        return self.x

In [None]:
iterable = SimpleIterable()
print(type(iterable))
# iterable.x

In [None]:
iterator = iter(iterable)
type(iterator)
# iterable.x

In [None]:
# Call next 5 times
next(iterator)

## How to iterate on iterators

#### Iterators from containers (e.g. lists)

In [None]:
a = list(range(4))
print(a)

Calling _next()_ on a list won't work.

In [None]:
next(a)

But getting an _iterator_ from a list and iterating on it will.

In [None]:
a = iter(a)
# iter(iter) = iter

next(a)

#### The foreach construct
- Built-in in the language with the _**for** ... **in**_ construct
- It allows looping on all elements of an iterable.
- Automatically calls the _iter(...)_ function before starting looping

In [None]:
print("With iter()")
iterator = iter(iterable)
for item in iterator:
    print(item)

In [None]:
print("Without iter()")
for item in SimpleIterable():
    print(item)

## Generators
- Functions containing the keyword ***yield***
- _yield_ :
    - works similarly to _return_ and returns an object when called...
    - ... but **state of the function is saved**

- When _next()_ is called again on the generator function, execution resumes where it was left off

- Note that generators **do not return** values when initialized.

### Examples

#### Trivial generator

In [None]:
def f():
  print("-- start --")
  yield 3

  print("-- middle --")
  yield 4

  print("-- finished --")

In [None]:
generator = f()
generator

In [None]:
next(generator)

#### Counter

In [None]:
def counter():
    x : int = 0
        
    while True:
        yield x
        x += 1
        
generator = counter()
generator

In [None]:
next(generator)

#### Generators can also be defined inline!

In [None]:
generator = (x for x in range(10))
print("generator type:", type(generator))

list(generator)

#### **Mind the difference with list comprehensions!**

In [None]:
not_a_generator = [x for x in range(10)]
print("not_a_generator type:", type(not_a_generator))

### Generators are **lazy iterators**
- They are used to **generate values dynamically**
    - Very useful to cope, for instance, with Out of Memory issues
- As iterators, they don't implement the *\_\_len\_\_* method
    - i.e.,  *len()* function will cause an exception
- Generators support **bidirectional communication**.
    - You can pass values to the generator **after** its initialization
- **Concurrent** and **recursive invocations** are **allowed**...
    - ... even though they are **not thread safe** out of the box.

### Dynamic value generation example

In [None]:
from datetime import datetime
print("MS output format:", datetime.now().microsecond)

def very_unsafe_prng(max_value):    
    while True:
        yield datetime.now().microsecond % max_value
        
generator = very_unsafe_prng(10)
generator

In [None]:
# No len!
len(generator)

In [None]:
next(generator)

### Bidirectional communication
Bidirectional communication allows to send values to the generator. Relies on three methods:
- ___.send(...)___: sends the value to the generator and, like __next()__, returns the next value
- ___.throw(...)___: throws the passed exception after resuming the generator that will handle it
- ___.close()___: stops the generator. Equivalent to ___.throw(GeneratorExit())___
- _yield_ can be used in expressions to assign values to generator’s variables
    - Values will be assigned when the generator resumes from _yield_

####  Example

In [None]:
from random import choice

# Define allowed values ([1, 7])
values = list(range(1, 8))

# Define a generator
def seven_and_half():
    values_sum = 0
    results = []

    player = 1
    
    # Keep going until generator is closed
    while True:
        # Pick a card (pseudo) randomly
        value = choice(values)

        # Accumulate the values
        values_sum += value
        
        # 
        try:
            response = yield value, values_sum
        except GeneratorExit:
            results.append((player, values_sum),)
            
            for player, score in results:
                print("Player {} scored {}".format(player, score))
            break
        
        if response is False or response is None:            
            results.append((player, values_sum),)
            values_sum = 0
            player += 1
            
    print("Exiting")

In [None]:
# Init the generator
generator = seven_and_half()

In [None]:
keep_playing = None

# Keep in mind that
# generator.send(None) === next(generator)
    
while True:  
    if keep_playing is False:
        break
        
    print("Picking a card.")
    value, values_sum = generator.send(keep_playing)
        
    print("Picked {}. Total: {}".format(value, values_sum))
    
    if values_sum > 7:
        print("You lost!")
        keep_playing = False    
    else:
        keep_playing : bool = (input("Keep picking? ") in ["y", "Y", True] )
    # print(output)

In [None]:
generator.close()

## Exercise

#### Intro
CSV (Comma-Separated Values) files are text files where **each row is a** data **record** and **columns** are **separated by commas** (or some other character).

The first row is (usually) the header (i.e., the name of the corresponding column).

A CSV file looks like:

    id,name,surname
    0,Mickey,Mouse

#### Request

You have to **process a _CSV_ file**. Lets assume it is **too large to fit in RAM**.

You should process it in small pieces, e.g., by reading each line sequentially using a generator.

Specifically, after reading the header, for each line of the file create a dictionary with the column-value associations.

#### A few tips
- *Pathlib* module offers classes representing filesystem paths with semantics appropriate for different operating systems
- Use the _**zip**(*iterables)_ builtin function. [From the docs](https://docs.python.org/3/library/functions.html#zip):
    - Builds an iterator that aggregates elements from each of the iterables
    - That is, it returns an iterator of tuples, i.e., The i-th element of the tuple contains the i-th element from each of the argument sequences or iterables.
    - The iterator stops when the shortest input iterable is exhausted
- Use the ***with*** statement. [From the docs](https://docs.python.org/3/reference/compound_stmts.html#with): the with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common *try…except…finally* usage patterns to be encapsulated for convenient reuse.
- As an example CSV, download [*as raw*](https://raw.githubusercontent.com/Currie32/500-Greatest-Albums/master/albumlist.csv) the [Rolling Stone Magazine's list of "The 500 Greatest Albums of All Time."](https://github.com/Currie32/500-Greatest-Albums/blob/master/albumlist.csv) from GitHub.

### Solution

In [None]:
from pathlib2 import Path


def dataset_reader(file):
    # Lets use pathlib instead of using the open() function,
    # with open(file, "r+") as f:

    # Creating a Path instance.
    file = Path(file)

    # Not actually needed, just showing some functionality
    if not file.absolute():
        file = file.resolve()
        
    print("Opening", file.name, "in folder", file.parent)
    
    if not file.exists():
        raise FileNotFoundError("File doesn't exist!")
        
    
    with file.open("r+", encoding="ISO-8859-15") as f:
        header = f.readline()
        columns = header.strip().split(',')
        
        print("Found columns:", columns)
        for line in f:
            values = line.strip().split(',')
#             print(values)
            
            try: 
                yield dict(zip(columns, values))
            except GeneratorExit:
                print("Closing the generator!")
                break

In [None]:
file = "albumlist.csv"

generator = dataset_reader(file)

In [None]:
next(generator)

---
## References

* [Scipy Lectures](https://scipy-lectures.org/advanced/advanced_python/index.html)
* [StackAbuse](https://stackabuse.com/introduction-to-python-iterators/)
* [RealPython](https://realpython.com/introduction-to-python-generators/)