Final Review: Problem Sets 1
Problem 1: Function Logger Decorator with Generators
Given the following Python code:
from functools import wraps
def function_logger(func):
counter = 0
@wraps(func)
def wrapper(*args, **kwargs):
nonlocal counter
counter += 1
result = func(*args, **kwargs)
print(f"The function {func.__name__} has run {counter} times")
return result
return wrapper
@function_logger
def count_generator():
i = 0
while True:
yield i
i += 1
@function_logger
def do_nothing() -> None:
pass
gen = count_generator()
for _ in range(5):
print(next(gen))
for _ in range(5):
do_nothing()(a) How many times will the output The function count_generator has run n times be printed?
Solution
The output The function count_generator has run n times will be printed 1 time.
Explanation: When you decorate count_generator() with @function_logger, the decorator wraps the function. When you call count_generator(), it calls the wrapper function once, which increments the counter to 1 and prints “The function count_generator has run 1 times”, then returns a generator object. The subsequent calls to next(gen) don’t invoke the decorated function again—they operate on the generator object that was already created.
(b) How many times will the output The function do_nothing has run n times be printed?
Solution
The output The function do_nothing has run n times will be printed 5 times.
Explanation: Each call to do_nothing() in the loop invokes the wrapper function, incrementing the counter. So:
- First call: counter = 1, prints “The function do_nothing has run 1 times”
- Second call: counter = 2, prints “The function do_nothing has run 2 times”
- Third call: counter = 3, prints “The function do_nothing has run 3 times”
- Fourth call: counter = 4, prints “The function do_nothing has run 4 times”
- Fifth call: counter = 5, prints “The function do_nothing has run 5 times”
The key difference is that count_generator is called once and returns a generator object, while do_nothing is called 5 times in the loop.
Problem 2: Fibonacci Coroutine with Send
Write a coroutine function that yields Fibonacci numbers indefinitely, but allows resetting its state via send(). The coroutine will receive any integer value as a sentinel for resetting its state.
Example usage:
fib_gen = fibonacci_generator()
# Generate some numbers
print(next(fib_gen)) # 0
print(next(fib_gen)) # 1
print(next(fib_gen)) # 1
print(next(fib_gen)) # 2
print(next(fib_gen)) # 3
# Reset the sequence
print(fib_gen.send(0)) # 0
# Start over
print(next(fib_gen)) # 1
print(next(fib_gen)) # 1Solution
def fibonacci_generator():
a, b = 0, 1
while True:
reset = yield a
if reset is not None: # ie. a value was sent, be it 0 or otherwise
a, b = 0, 1
else:
a, b = b, a + bExplanation: The coroutine uses yield with a value to receive data via send(). When send() is called with a value, the yield statement returns that value to the variable reset. If a value is sent (not None), the Fibonacci sequence is reset to the beginning. Otherwise, the sequence continues to the next Fibonacci number.
Problem 3: Decorator Replacing Generator
What will be the output of the following code?
def decorator(func):
def inner():
return (x for x in range(5))
return inner
@decorator
def gen():
for x in range(3):
yield x
print(sum(gen()))Solution
Output:
10
Explanation: The typical idea with a decorator is to wrap a function, enhancing its behavior by executing code before and/or after the original function call, but the point is that the inner function of a decorator will call the original function (the func parameter in the above code) at some point. However, in this case, the decorator completely replaces the original gen function with a new function inner that returns a generator expression (x for x in range(5)).
When gen() is called, it actually calls inner(), which returns a generator that yields numbers from 0 to 4. The sum() function then adds these numbers together: 0 + 1 + 2 + 3 + 4 = 10.
Problem 4: Nested Generator Expression Sum
Determine the output of this nested generator comprehension:
print(sum(x for x in (y for y in range(3))))Solution
Output:
3
Explanation: The inner generator (y for y in range(3)) produces 0, 1, 2. The outer generator (x for x in ...) iterates over these values. The sum() function adds them: 0 + 1 + 2 = 3.
Problem 5: Generator Comprehension with Tuples
Is there a syntax error in this code? If not, determine the output:
gen = ((x * y,) for x in range(2) for y in range(2))
print(list(gen))Solution
No syntax error. Single-element tuples must have a trailing comma, otherwise it would not be recognized as a tuple. Output:
[(0,), (0,), (0,), (1,)]
Explanation: The generator expression creates tuples containing the product of x and y. It iterates through:
- x=0, y=0: (0*0,) = (0,)
- x=0, y=1: (0*1,) = (0,)
- x=1, y=0: (1*0,) = (0,)
- x=1, y=1: (1*1,) = (1,)
Note: The parentheses around x * y, create a tuple. Without them, it would be a syntax error because the comma would be ambiguous in the comprehension.
Problem 6: Multiple Decorators Order
Predict the output of the following code involving multiple decorators:
def dec1(func):
def wrapper(x):
return func(x) * 10
return wrapper
def dec2(func):
def wrapper(x):
return -func(x)
return wrapper
def dec3(func):
def wrapper(x):
return func(x) + 5
return wrapper
@dec1
@dec2
@dec3
def number(n):
return n
print(number(5))Solution
Output:
-100
Explanation: Decorators are applied bottom-to-top. The decoration order is:
number(n)returns n@dec3wraps it: a function that returnsfunc(x) + 5, sodec3(5) = 5 + 5 = 10@dec2wraps it: a function that returns-func(x), sodec2(5) = -(10) = -10@dec1wraps it: a function that returnsfunc(x) * 10, sodec1(5) = -10 * 10 = -100
The final result is -100.
Problem 7: Nested Generator Expression Trace
Trace the following nested generator expression. What is the output of the print statements?
gen = ((x, y, z) for x in 'abc' for y in '123' for z in [True, False])
print(next(gen))
print(next(gen))
print(next(gen))Solution
Output:
('a', '1', True)
('a', '1', False)
('a', '2', True)
Explanation: The nested generator creates tuples by iterating through x, then y, then z. The innermost loop (z) completes first:
- First: x=‘a’, y=‘1’, z=True → (‘a’, ‘1’, True)
- Second: x=‘a’, y=‘1’, z=False → (‘a’, ‘1’, False)
- Third: x=‘a’, y=‘2’, z=True → (‘a’, ‘2’, True)
The order is depth-first with the rightmost loop varying fastest.
Problem 8: Dictionary Copy Behavior
What is the output of the following code? If we want the output to be the opposite, what do we have to change about the code?
data = {
"name": "General Kenobi",
"lines": {
"hello there": 1,
"this is where the fun begins": 0,
}
}
new_data = data.copy()
new_data["lines"]["hello there"] = 2
print(data == new_data)Solution
Output:
True
Explanation: This demonstrates the shallow copy pitfall. data.copy() creates a shallow copy, meaning it copies the top-level dictionary but not the nested dictionaries. Both data and new_data share the same reference to the nested “lines” dictionary. When you modify new_data["lines"]["hello there"], you’re modifying the shared nested dictionary, so data is also affected. Therefore, data == new_data is True.
To get the opposite output (False), use a deep copy:
import copy
new_data = copy.deepcopy(data)
new_data["lines"]["hello there"] = 2
print(data == new_data) # Output: FalseA deep copy recursively copies all nested objects, so modifications to new_data don’t affect data.
Problem 9: Dictionary Unpacking
What is the output?
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
d2 = {**d1, **d2}
print(d2)Solution
Output:
{'a': 1, 'b': 3, 'c': 4}
Explanation: Dictionary unpacking (**) is applied left to right. First, d1 is unpacked giving {'a': 1, 'b': 2}. Then, d2 (the original {'b': 3, 'c': 4}) is unpacked. When there are duplicate keys, the rightmost value wins. So 'b' has value 3 (from the second unpacking), not 2. The result is {'a': 1, 'b': 3, 'c': 4}.
Problem 10: Memoization Function
Write a function create_memo to memoize another function func. The signature of func is expected to be a function that takes a single integer argument and returns an integer.
The create_memo function should:
- Accept a function reference
funcas an argument. - Return a new function that caches the results of
funcfor each unique input. It will also have the same signature asfunc.
Example Usage:
def square(x):
print(f"Computing {x} squared... ", end="")
return x * x
def factorial(x):
print(f"Computing {x} factorial... ", end="")
fact = 1
for i in range(2, x + 1):
fact *= i
return fact
memoized_square = create_memo(square)
print(memoized_square(5)) # Output: Computing 5 squared... 25
print(memoized_square(5)) # Output: 25 (cached)
print(memoized_square(7)) # Output: Computing 7 squared... 49
print(memoized_square(5)) # Output: 25 (cached)
print(memoized_square(7)) # Output: 49 (cached)
memoized_factorial = create_memo(factorial)
print(memoized_factorial(3)) # Output: Computing 3 factorial... 6
print(memoized_factorial(3)) # Output: 6 (cached)
print(memoized_factorial(3)) # Output: 6 (cached)Solution
def create_memo(func):
cache = {}
def memoize(x):
if x not in cache:
cache[x] = func(x)
return cache[x]
return memoizeExplanation: The memoization function creates a cache dictionary. When the memoize function is called, it creates a key from the arguments. If the key is in the cache, it returns the cached result. Otherwise, it calls the original function, stores the result in the cache, and returns it. This avoids redundant calculations for the same inputs.
Furthermore, this implementation of a memo is limited to a func takes a single integer argument. If you want to generalize it for functions with multiple arguments, you can use *args and **kwargs to create a more flexible caching mechanism.
Also, notice how we implemented a pattern involving a closure similar to decorators, where create_memo returns a new function memoize that adds memoization behavior to the original function func. So, create_memo can be used as a decorator as well:
@create_memo
def square(x):
print(f"Computing {x} squared... ", end="")
return x * x
result = square(5) # This will compute and cache the result
result = square(5) # This will skip the compute and return the cached resultSo, really what was shown earlier was an implementation of a decorator that adds memoization to a function, and used it manually instead of with the @ syntax. This shows the underlying mechanics of how decorators work in Python.
Problem 11: Duplicate Header Detection
Background
Consider a dataset represented as a table with headers and rows. A well-formed dataset should have a single header row followed by data rows.
Example Table
| Term | Subject | Course Number | Instructor |
|---|---|---|---|
| Spring 2025 | CSCI | 381 | Waxman, Jerry |
| Spring 2025 | CSCI | 320 | Boklan, Kent |
| Spring 2025 | MATH | 245 | Gangaram, Elliot |
However, sometimes datasets may be malformed and contain duplicate header rows within the data rows, which can lead to confusion during data processing.
| Term | Subject | Course Number | Instructor |
|---|---|---|---|
| Term | Subject | Course Number | Instructor |
| Spring 2025 | CSCI | 381 | Waxman, Jerry |
| Spring 2025 | CSCI | 320 | Boklan, Kent |
| Spring 2025 | MATH | 245 | Gangaram, Elliot |
Consider the following manner in which we can represent the values in the headers and rows as arrays:
Representation of Headers (labels): ["Term", "Subject", "Course Number", "Instructor"]
Representation of a Row: ["Spring 2025", "CSCI", "381", "Waxman, Jerry"]
The i-th value in the row corresponds to the i-th label.
(a): Implement a function is_heuristic_match.
This function decides whether the first row is a duplicate header row based on a heuristic. If enough positions match, the first row is considered a duplicate header row. Depending on the value of threshold, the function returns True if the ratio of matching positions to total positions is greater than or equal to threshold.
def is_heuristic_match(data, *, threshold) -> bool:
passWhere data is a dictionary with keys:
headers: a list of strings representing the expected column labels.rows: a list of lists, where each inner list represents a row of data.threshold: a float in the range[0, 1]representing the minimum ratio of matching positions required to consider the first row a duplicate header row.
Assume that n is less than or equal to the number of rows in data["rows"].
Example usage:
data = {
"headers": ["Term", "Subject", "Course Number", "Instructor"],
"rows": [
["Term", "Subject", "Course Number", "Professor"],
["Spring 2025", "CSCI", "381", "Waxman, Jerry"],
]
}
print(is_heuristic_match(data, threshold=0.75))
# Expected Output: True
# Explanation: Since 3 out of 4 positions match (a 0.75 ratio), then this is considered a duplicate header row since it meets the threshold of 0.75.
print(is_heuristic_match(data, threshold=0.80))
# Expected Output: False
# Explanation: Since 3 out of 4 positions match (a 0.75 ratio), then this is NOT considered a duplicate header rowSolution
def is_heuristic_match(data, *, threshold) -> bool:
headers = data["headers"]
first_row = data["rows"][0]
matches = sum(1 for h, r in zip(headers, first_row) if h == r)
# or:
# matches = sum(h == r for h, r in zip(headers, first_row))
ratio = matches / len(headers)
return ratio >= threshold(b): Get a heuristic for multiple rows
Write a function get_heuristic_matches that processes the first n rows of data and returns a list of booleans indicating whether each row is considered a duplicate header row, depending on the given threshold.
Assume that n is less than or equal to the number of rows in data["rows"].
def get_heuristic_matches(data, n: int, *, threshold) -> list[bool]:
passExample usage:
data = {
"headers": ["Term", "Subject", "Course Number", "Instructor"],
"rows": [
["Term", "Subject", "Course Number", "Instructor"],
["Term", "Subject", "Course Nbr", "Instructor"],
["Spring 2025", "CSCI", "381", "Waxman, Jerry"],
["Spring 2025", "MATH", "245", "Gangaram, Elliot"],
]
}
print(get_heuristic_matches(data, 3, threshold=0.75))
# Expected Output: [True, True, False]Solution
def get_heuristic_matches(data, n: int, *, threshold) -> list[bool]:
headers = data["headers"]
results = []
for i in range(n):
row = data["rows"][i]
matches = sum(h == r for h, r in zip(headers, row))
ratio = matches / len(headers)
results.append(ratio >= threshold)
return resultsProblem 12: File Processing with Line Filtering
In this problem, you will write functions to process text files.
Assume you have access to a global variable PUNCTUATION that is a string containing all possible punctuation characters. If you are asked to analyze individual words, make sure to normalize the words by stripping the punctuation from the ends of words. ie. a word like "hello!" should be treated as "hello".
Hint
defaultdict from the collections module can be useful for counting occurrences of words without needing to check if the key exists first. It can help to reduce boilerplate code when counting.
The below code creates a defaultdict where the values for missing keys default to 0, but is unnecessarily verbose compared to using defaultdict(int)
from collections import defaultdict
word_count = defaultdict(lambda: 0)defaultdict(int) works as an equivalent to defaultdict(lambda: 0) because invoking int() returns 0, therefore int can act as a callback that returns 0 for initializing values for missing keys.
from collections import defaultdict
word_count = defaultdict(int)(a) Write a function that opens the file, reads all lines, and prints only the lines that contain more than 30 characters.
Solution
Print lines with more than 30 characters
def print_long_lines(filename):
with open(filename, 'r') as fd:
for line in fd:
# Strip punctuation and newline characters from ends
line = line.strip("\n")
if len(line) > 30:
print(line)Alternatively, using splitlines() to avoid newline characters by default:
def print_long_lines(filename):
with open(filename, 'r') as fd:
for line in fd.read().splitlines():
if len(line) > 30:
print(line)(b) Modify your function to return a list of all words (separated by whitespace) that appear more than once in the file.
Solution
Return words that appear more than once
def get_repeated_words(filename):
word_count = {}
with open(filename, 'r') as fd:
for line in fd:
words = line.split()
for word in words:
# Strip punctuation from ends
clean_word = word.strip(PUNCTUATION)
# Skip empty strings after stripping
if not clean_word:
continue
if clean_word in word_count:
word_count[clean_word] += 1
else:
word_count[clean_word] = 1
return [word for word, count in word_count.items() if count > 1]Alternative using defaultdict:
from collections import defaultdict
def get_repeated_words(filename):
word_count = defaultdict(int)
with open(filename, 'r') as fd:
for line in fd:
words = line.split()
for word in words:
# Strip punctuation from ends
clean_word = word.strip(PUNCTUATION)
# Skip empty strings after stripping
if not clean_word:
continue
word_count[clean_word] += 1
return [word for word, count in word_count.items() if count > 1](c) Write a function that counts the number of lines that start with a vowel (a, e, i, o, u), case-insensitive.
Solution
Count lines starting with a vowel
def count_vowel_start_lines(filename):
vowels = 'aeiou'
count = 0
with open(filename, 'r') as fd:
for line in fd:
if line and line[0].lower() in vowels:
count += 1
return countProblem 13: Unique Words and Text Processing
In this problem, you will write functions to process text files.
Assume you have access to a global variable PUNCTUATION that is a string containing all possible punctuation characters. If you are asked to analyze individual words, make sure to normalize the words by stripping the punctuation from the ends of words. ie. a word like "hello!" should be treated as "hello".
(a) Write a function that takes a filename and returns the number of unique words in the file (case-insensitive).
Solution
Count unique words (case-insensitive, strip punctuation from ends)
def count_unique_words(filename):
unique_words = set()
with open(filename, 'r') as fd:
for line in fd:
words = line.lower().split()
for word in words:
# Strip punctuation from ends
clean_word = word.strip(PUNCTUATION)
if clean_word:
unique_words.add(clean_word)
return len(unique_words)(b) Modify your function to also return the 5 most common words and their counts.
Solution
Return unique count and 5 most common words (using a dictionary)
def count_unique_words_with_top(filename):
word_count = {}
with open(filename, 'r') as fd:
for line in fd:
words = line.lower().split()
for word in words:
clean_word = word.strip(PUNCTUATION)
# Skip empty strings after stripping
if not clean_word:
continue
if clean_word in word_count:
word_count[clean_word] += 1
else:
word_count[clean_word] = 1
unique_count = len(word_count)
sorted_by_decreasing_word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
return unique_count, sorted_by_decreasing_word_count[:5]Alternative using defaultdict:
from collections import defaultdict
def count_unique_words_with_top(filename):
word_count = defaultdict(int)
with open(filename, 'r') as fd:
for line in fd:
words = line.lower().split()
for word in words:
clean_word = word.strip(PUNCTUATION)
if clean_word:
word_count[clean_word] += 1
unique_count = len(word_count)
sorted_by_decreasing_word_count = sorted(word_count.items(), key=lambda x: x[1], reverse=True)
return unique_count, sorted_by_decreasing_word_count[:5](c) Write a generator function that yields each sentence (line) ending with a period, exclamation mark, or question mark from the file, one at a time.
Solution
Generator function for sentences
def sentence_generator(filename):
with open(filename, 'r') as fd:
for line in fd.read().splitlines():
if not line:
continue
if line.endswith(('.', '!', '?')):
yield line