Final Review: Problem Sets 1

Problem 1: Function Logger Decorator with Generators

Given the following Python code:

from functools import wraps
 
def function_logger(func):
    counter = 0
 
    @wraps(func)
    def wrapper(*args, **kwargs):
        nonlocal counter
 
        counter += 1
        result = func(*args, **kwargs)
 
        print(f"The function {func.__name__} has run {counter} times")
 
        return result
    return wrapper
 
 
@function_logger
def count_generator():
    i = 0
    while True:
        yield i
        i += 1
 
 
@function_logger
def do_nothing() -> None:
    pass
 
 
gen = count_generator()
for _ in range(5):
    print(next(gen))
 
for _ in range(5):
    do_nothing()

(a) How many times will the output The function count_generator has run n times be printed?

Solution

The output The function count_generator has run n times will be printed 1 time.

Explanation: When you decorate count_generator() with @function_logger, the decorator wraps the function. When you call count_generator(), it calls the wrapper function once, which increments the counter to 1 and prints “The function count_generator has run 1 times”, then returns a generator object. The subsequent calls to next(gen) don’t invoke the decorated function again—they operate on the generator object that was already created.

(b) How many times will the output The function do_nothing has run n times be printed?

Solution

The output The function do_nothing has run n times will be printed 5 times.

Explanation: Each call to do_nothing() in the loop invokes the wrapper function, incrementing the counter. So:

First call: counter = 1, prints “The function do_nothing has run 1 times”
Second call: counter = 2, prints “The function do_nothing has run 2 times”
Third call: counter = 3, prints “The function do_nothing has run 3 times”
Fourth call: counter = 4, prints “The function do_nothing has run 4 times”
Fifth call: counter = 5, prints “The function do_nothing has run 5 times”

The key difference is that count_generator is called once and returns a generator object, while do_nothing is called 5 times in the loop.

Problem 2: Fibonacci Coroutine with Send

Write a coroutine function that yields Fibonacci numbers indefinitely, but allows resetting its state via send(). The coroutine will receive any integer value as a sentinel for resetting its state.

Example usage:

fib_gen = fibonacci_generator()
 
# Generate some numbers
print(next(fib_gen))  # 0
print(next(fib_gen))  # 1
print(next(fib_gen))  # 1
print(next(fib_gen))  # 2
print(next(fib_gen))  # 3
 
# Reset the sequence
print(fib_gen.send(0))  # 0
 
# Start over
print(next(fib_gen))  # 1
print(next(fib_gen))  # 1

Solution

def fibonacci_generator():
    a, b = 0, 1
    while True:
        reset = yield a
        if reset is not None: # ie. a value was sent, be it 0 or otherwise
            a, b = 0, 1
        else:
            a, b = b, a + b

Explanation: The coroutine uses yield with a value to receive data via send(). When send() is called with a value, the yield statement returns that value to the variable reset. If a value is sent (not None), the Fibonacci sequence is reset to the beginning. Otherwise, the sequence continues to the next Fibonacci number.

Problem 3: Decorator Replacing Generator

What will be the output of the following code?

def decorator(func):
    def inner():
        return (x for x in range(5))
 
    return inner
 
@decorator
def gen():
    for x in range(3):
        yield x
 
print(sum(gen()))

Solution

Output:

Explanation: The typical idea with a decorator is to wrap a function, enhancing its behavior by executing code before and/or after the original function call, but the point is that the inner function of a decorator will call the original function (the func parameter in the above code) at some point. However, in this case, the decorator completely replaces the original gen function with a new function inner that returns a generator expression (x for x in range(5)).

When gen() is called, it actually calls inner(), which returns a generator that yields numbers from 0 to 4. The sum() function then adds these numbers together: 0 + 1 + 2 + 3 + 4 = 10.

Problem 4: Nested Generator Expression Sum

Determine the output of this nested generator comprehension:

print(sum(x for x in (y for y in range(3))))

Solution

Output:

Explanation: The inner generator (y for y in range(3)) produces 0, 1, 2. The outer generator (x for x in ...) iterates over these values. The sum() function adds them: 0 + 1 + 2 = 3.

Problem 5: Generator Comprehension with Tuples

Is there a syntax error in this code? If not, determine the output:

gen = ((x * y,) for x in range(2) for y in range(2))
print(list(gen))

Solution

No syntax error. Single-element tuples must have a trailing comma, otherwise it would not be recognized as a tuple. Output:

[(0,), (0,), (0,), (1,)]

Explanation: The generator expression creates tuples containing the product of x and y. It iterates through:

x=0, y=0: (0*0,) = (0,)
x=0, y=1: (0*1,) = (0,)
x=1, y=0: (1*0,) = (0,)
x=1, y=1: (1*1,) = (1,)

Note: The parentheses around x * y, create a tuple. Without them, it would be a syntax error because the comma would be ambiguous in the comprehension.

Problem 6: Multiple Decorators Order

Predict the output of the following code involving multiple decorators:

def dec1(func):
    def wrapper(x):
        return func(x) * 10
    return wrapper
 
 
def dec2(func):
    def wrapper(x):
        return -func(x)
    return wrapper
 
 
def dec3(func):
    def wrapper(x):
        return func(x) + 5
    return wrapper
 
 
@dec1
@dec2
@dec3
def number(n):
    return n
 
 
print(number(5))

Solution

Output:

-100

Explanation: Decorators are applied bottom-to-top. The decoration order is:

number(n) returns n
@dec3 wraps it: a function that returns func(x) + 5, so dec3(5) = 5 + 5 = 10
@dec2 wraps it: a function that returns -func(x), so dec2(5) = -(10) = -10
@dec1 wraps it: a function that returns func(x) * 10, so dec1(5) = -10 * 10 = -100

The final result is -100.

Problem 7: Nested Generator Expression Trace

Trace the following nested generator expression. What is the output of the print statements?

gen = ((x, y, z) for x in 'abc' for y in '123' for z in [True, False])
 
print(next(gen))
print(next(gen))
print(next(gen))

Solution

Output:

('a', '1', True)
('a', '1', False)
('a', '2', True)

Explanation: The nested generator creates tuples by iterating through x, then y, then z. The innermost loop (z) completes first:

First: x=‘a’, y=‘1’, z=True → (‘a’, ‘1’, True)
Second: x=‘a’, y=‘1’, z=False → (‘a’, ‘1’, False)
Third: x=‘a’, y=‘2’, z=True → (‘a’, ‘2’, True)

The order is depth-first with the rightmost loop varying fastest.

Problem 8: Dictionary Copy Behavior

What is the output of the following code? If we want the output to be the opposite, what do we have to change about the code?

data = {
    "name": "General Kenobi",
    "lines": {
        "hello there": 1,
        "this is where the fun begins": 0,
    }
}
 
new_data = data.copy()
new_data["lines"]["hello there"] = 2
 
print(data == new_data)

Solution

Output:

True

Explanation: This demonstrates the shallow copy pitfall. data.copy() creates a shallow copy, meaning it copies the top-level dictionary but not the nested dictionaries. Both data and new_data share the same reference to the nested “lines” dictionary. When you modify new_data["lines"]["hello there"], you’re modifying the shared nested dictionary, so data is also affected. Therefore, data == new_data is True.

To get the opposite output (False), use a deep copy:

import copy
 
new_data = copy.deepcopy(data)
new_data["lines"]["hello there"] = 2
 
print(data == new_data)  # Output: False

A deep copy recursively copies all nested objects, so modifications to new_data don’t affect data.

Problem 9: Dictionary Unpacking

What is the output?

d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
 
d2 = {**d1, **d2}
print(d2)

Solution

Output:

{'a': 1, 'b': 3, 'c': 4}

Explanation: Dictionary unpacking (**) is applied left to right. First, d1 is unpacked giving {'a': 1, 'b': 2}. Then, d2 (the original {'b': 3, 'c': 4}) is unpacked. When there are duplicate keys, the rightmost value wins. So 'b' has value 3 (from the second unpacking), not 2. The result is {'a': 1, 'b': 3, 'c': 4}.

Problem 10: Memoization Function

Write a function create_memo to memoize another function func. The signature of func is expected to be a function that takes a single integer argument and returns an integer.

The create_memo function should:

Accept a function reference func as an argument.
Return a new function that caches the results of func for each unique input. It will also have the same signature as func.

Example Usage:

def square(x):
    print(f"Computing {x} squared... ", end="")
    return x * x
 
def factorial(x):
    print(f"Computing {x} factorial... ", end="")
    fact = 1
    for i in range(2, x + 1):
        fact *= i
 
    return fact
 
memoized_square = create_memo(square)
print(memoized_square(5))  # Output: Computing 5 squared... 25
print(memoized_square(5))  # Output: 25 (cached)
print(memoized_square(7))  # Output: Computing 7 squared... 49
print(memoized_square(5))  # Output: 25 (cached)
print(memoized_square(7))  # Output: 49 (cached)
 
memoized_factorial = create_memo(factorial)
print(memoized_factorial(3)) # Output: Computing 3 factorial... 6
print(memoized_factorial(3)) # Output: 6 (cached)
print(memoized_factorial(3)) # Output: 6 (cached)

Solution

def create_memo(func):
    cache = {}
 
    def memoize(x):
        if x not in cache:
            cache[x] = func(x)
        return cache[x]
 
    return memoize

Explanation: The memoization function creates a cache dictionary. When the memoize function is called, it creates a key from the arguments. If the key is in the cache, it returns the cached result. Otherwise, it calls the original function, stores the result in the cache, and returns it. This avoids redundant calculations for the same inputs.

Furthermore, this implementation of a memo is limited to a func takes a single integer argument. If you want to generalize it for functions with multiple arguments, you can use *args and **kwargs to create a more flexible caching mechanism.

Also, notice how we implemented a pattern involving a closure similar to decorators, where create_memo returns a new function memoize that adds memoization behavior to the original function func. So, create_memo can be used as a decorator as well:

@create_memo
def square(x):
    print(f"Computing {x} squared... ", end="")
    return x * x
 
result = square(5)  # This will compute and cache the result
result = square(5)  # This will skip the compute and return the cached result

So, really what was shown earlier was an implementation of a decorator that adds memoization to a function, and used it manually instead of with the @ syntax. This shows the underlying mechanics of how decorators work in Python.

Problem 11: Duplicate Header Detection

Background

Consider a dataset represented as a table with headers and rows. A well-formed dataset should have a single header row followed by data rows.

Example Table

Term	Subject	Course Number	Instructor
Spring 2025	CSCI	381	Waxman, Jerry
Spring 2025	CSCI	320	Boklan, Kent
Spring 2025	MATH	245	Gangaram, Elliot

However, sometimes datasets may be malformed and contain duplicate header rows within the data rows, which can lead to confusion during data processing.

Term	Subject	Course Number	Instructor
Term	Subject	Course Number	Instructor
Spring 2025	CSCI	381	Waxman, Jerry
Spring 2025	CSCI	320	Boklan, Kent
Spring 2025	MATH	245	Gangaram, Elliot

Consider the following manner in which we can represent the values in the headers and rows as arrays:

Representation of Headers (labels): ["Term", "Subject", "Course Number", "Instructor"]

Representation of a Row: ["Spring 2025", "CSCI", "381", "Waxman, Jerry"]

The i-th value in the row corresponds to the i-th label.

(a): Implement a function is_heuristic_match.

This function decides whether the first row is a duplicate header row based on a heuristic. If enough positions match, the first row is considered a duplicate header row. Depending on the value of threshold, the function returns True if the ratio of matching positions to total positions is greater than or equal to threshold.

def is_heuristic_match(data, *, threshold) -> bool:
    pass

Where data is a dictionary with keys:

headers: a list of strings representing the expected column labels.
rows: a list of lists, where each inner list represents a row of data.
threshold: a float in the range [0, 1] representing the minimum ratio of matching positions required to consider the first row a duplicate header row.

Assume that n is less than or equal to the number of rows in data["rows"].

Example usage:

data = {
    "headers": ["Term", "Subject", "Course Number", "Instructor"],
    "rows": [
        ["Term", "Subject", "Course Number", "Professor"],
        ["Spring 2025", "CSCI", "381", "Waxman, Jerry"],
    ]
}
 
print(is_heuristic_match(data, threshold=0.75))
# Expected Output: True
# Explanation: Since 3 out of 4 positions match (a 0.75 ratio), then this is considered a duplicate header row since it meets the threshold of 0.75.
 
print(is_heuristic_match(data, threshold=0.80))
# Expected Output: False
# Explanation: Since 3 out of 4 positions match (a 0.75 ratio), then this is NOT considered a duplicate header row

Solution

def is_heuristic_match(data, *, threshold) -> bool:
    headers = data["headers"]
    first_row = data["rows"][0]
 
    matches = sum(1 for h, r in zip(headers, first_row) if h == r)
    # or:
    # matches = sum(h == r for h, r in zip(headers, first_row))
 
    ratio = matches / len(headers)
    return ratio >= threshold

(b): Get a heuristic for multiple rows

Write a function get_heuristic_matches that processes the first n rows of data and returns a list of booleans indicating whether each row is considered a duplicate header row, depending on the given threshold.

Assume that n is less than or equal to the number of rows in data["rows"].

def get_heuristic_matches(data, n: int, *, threshold) -> list[bool]:
    pass

Example usage:

data = {
    "headers": ["Term", "Subject", "Course Number", "Instructor"],
    "rows": [
        ["Term", "Subject", "Course Number", "Instructor"],
        ["Term", "Subject", "Course Nbr", "Instructor"],
        ["Spring 2025", "CSCI", "381", "Waxman, Jerry"],
        ["Spring 2025", "MATH", "245", "Gangaram, Elliot"],
    ]
}
 
print(get_heuristic_matches(data, 3, threshold=0.75))
# Expected Output: [True, True, False]

Solution

def get_heuristic_matches(data, n: int, *, threshold) -> list[bool]:
    headers = data["headers"]
    results = []
 
    for i in range(n):
        row = data["rows"][i]
        matches = sum(h == r for h, r in zip(headers, row))
        ratio = matches / len(headers)
        results.append(ratio >= threshold)
 
    return results

Problem 12: File Processing with Line Filtering

In this problem, you will write functions to process text files.

Assume you have access to a global variable PUNCTUATION that is a string containing all possible punctuation characters. If you are asked to analyze individual words, make sure to normalize the words by stripping the punctuation from the ends of words. ie. a word like "hello!" should be treated as "hello".

Hint

defaultdict from the collections module can be useful for counting occurrences of words without needing to check if the key exists first. It can help to reduce boilerplate code when counting.

The below code creates a defaultdict where the values for missing keys default to 0, but is unnecessarily verbose compared to using defaultdict(int)

from collections import defaultdict
 
word_count = defaultdict(lambda: 0)

defaultdict(int) works as an equivalent to defaultdict(lambda: 0) because invoking int() returns 0, therefore int can act as a callback that returns 0 for initializing values for missing keys.

from collections import defaultdict
 
word_count = defaultdict(int)

(a) Write a function that opens the file, reads all lines, and prints only the lines that contain more than 30 characters.

Solution

Print lines with more than 30 characters

def print_long_lines(filename):
    with open(filename, 'r') as fd:
        for line in fd:
            # Strip punctuation and newline characters from ends
            line = line.strip("\n")
            if len(line) > 30:
                print(line)

Alternatively, using splitlines() to avoid newline characters by default:

def print_long_lines(filename):
    with open(filename, 'r') as fd:
        for line in fd.read().splitlines():
            if len(line) > 30:
                print(line)

(b) Modify your function to return a list of all words (separated by whitespace) that appear more than once in the file.

Solution

Return words that appear more than once

def get_repeated_words(filename):
    word_count = {}
 
    with open(filename, 'r') as fd:
        for line in fd:
            words = line.split()
            for word in words:
                # Strip punctuation from ends
                clean_word = word.strip(PUNCTUATION)
 
                # Skip empty strings after stripping
                if not clean_word:
                    continue
 
                if clean_word in word_count:
                    word_count[clean_word] += 1
                else:
                    word_count[clean_word] = 1
 
    return [word for word, count in word_count.items() if count > 1]

Alternative using defaultdict:

from collections import defaultdict
 
def get_repeated_words(filename):
    word_count = defaultdict(int)
 
    with open(filename, 'r') as fd:
        for line in fd:
            words = line.split()
            for word in words:
                # Strip punctuation from ends
                clean_word = word.strip(PUNCTUATION)
 
                # Skip empty strings after stripping
                if not clean_word:
                    continue
 
                word_count[clean_word] += 1
 
    return [word for word, count in word_count.items() if count > 1]

(c) Write a function that counts the number of lines that start with a vowel (a, e, i, o, u), case-insensitive.

Solution

Count lines starting with a vowel

def count_vowel_start_lines(filename):
    vowels = 'aeiou'
    count = 0
 
    with open(filename, 'r') as fd:
        for line in fd:
            if line and line[0].lower() in vowels:
                count += 1
    return count

Problem 13: Unique Words and Text Processing

In this problem, you will write functions to process text files.

(a) Write a function that takes a filename and returns the number of unique words in the file (case-insensitive).

Solution

Count unique words (case-insensitive, strip punctuation from ends)

def count_unique_words(filename):
    unique_words = set()
 
    with open(filename, 'r') as fd:
        for line in fd:
            words = line.lower().split()
            for word in words:
                # Strip punctuation from ends
                clean_word = word.strip(PUNCTUATION)
                if clean_word:
                    unique_words.add(clean_word)
 
    return len(unique_words)

(b) Modify your function to also return the 5 most common words and their counts.

Solution

Return unique count and 5 most common words (using a dictionary)

def count_unique_words_with_top(filename):
    word_count = {}
 
    with open(filename, 'r') as fd:
        for line in fd:
            words = line.lower().split()
            for word in words:
                clean_word = word.strip(PUNCTUATION)
 
                # Skip empty strings after stripping
                if not clean_word:
                    continue
 
                if clean_word in word_count:
                    word_count[clean_word] += 1
                else:
                    word_count[clean_word] = 1
 
    unique_count = len(word_count)
    sorted_by_decreasing_word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
 
    return unique_count, sorted_by_decreasing_word_count[:5]

Alternative using defaultdict:

from collections import defaultdict
 
def count_unique_words_with_top(filename):
    word_count = defaultdict(int)
 
    with open(filename, 'r') as fd:
        for line in fd:
            words = line.lower().split()
            for word in words:
                clean_word = word.strip(PUNCTUATION)
                if clean_word:
                    word_count[clean_word] += 1
 
    unique_count = len(word_count)
    sorted_by_decreasing_word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
 
    return unique_count, sorted_by_decreasing_word_count[:5]

(c) Write a generator function that yields each sentence (line) ending with a period, exclamation mark, or question mark from the file, one at a time.

Solution

Generator function for sentences

def sentence_generator(filename):
    with open(filename, 'r') as fd:
        for line in fd.read().splitlines():
            if not line:
                continue
            if line.endswith(('.', '!', '?')):
                yield line