Learning Python with Advent of Code Walkthroughs

Dazbo's Advent of Code solutions, written in Python

Inventory Management

Advent of Code 2018 - Day 2

Day 2: Inventory Management System

Useful Links

Concepts and Packages Demonstrated

CounterCombinationsZip

Problem Intro

We’ve travelled back in time to 1518! We’re outside a utility closet in the North Pole, and we need to find some specific boxes in a warehouse. The boxes have IDs, and we need to scan them to find the ones we’re looking for.

The input is a list of box IDs, like this:

abcdef
bababc
abbcde
abcccd
aabcdd
abcdee
ababab

Part 1

What is the checksum for your list of box IDs?

The checksum is calculated by multiplying two numbers:

  1. The count of IDs that contain exactly two of any letter.
  2. The count of IDs that contain exactly three of any letter.

Note that a single ID can contribute to both counts if it has exactly two of one letter AND exactly three of another.

To solve this, we can use Python’s collections.Counter. This is a fantastic tool that takes an iterable (like a string) and creates a dictionary-like object where keys are the elements and values are their counts.

For each ID, we:

  1. Create a Counter of the characters.
  2. Check if 2 is in the values of the counter (i.e., some letter appears exactly twice).
  3. Check if 3 is in the values of the counter (i.e., some letter appears exactly three times).

Here is the code:

from collections import Counter

def part1(data):
    """ Count the number of IDs that contain exactly two of any letter 
    and the number of IDs that contain exactly three of any letter. 
    The "checksum" is the product of these two counts. """
    
    ids_with_2 = 0
    ids_with_3 = 0
    for id in data:
        counter = Counter(id)
        # Check if any letter appears exactly twice
        if 2 in counter.values():
            ids_with_2 += 1
        # Check if any letter appears exactly three times
        if 3 in counter.values():
            ids_with_3 += 1

    return ids_with_2 * ids_with_3

Part 2

What letters are common between the two correct box IDs?

The “correct” boxes are the only two boxes whose IDs differ by exactly one character at the same position. For example, abcde and axcye differ by two characters (2nd and 4th). fghij and fguij differ by exactly one character (the 3rd).

We need to find this pair and return the characters that are the same.

My strategy is to compare every possible pair of IDs. We can use itertools.combinations to generate all unique pairs of IDs from our list. This is much cleaner than writing nested loops.

For each pair, we need to count how many characters differ at the same position. We can use zip() to iterate over both strings simultaneously.

Approach 1: Concise and Pythonic

A very readable way to solve this is to calculate the total number of differences (often called the Hamming distance) using sum() with a generator expression.

from itertools import combinations

def part2(data):
    """ Find the two IDs that differ by exactly one character at the same position in both strings. """
    
    # Compare every pair of IDs. combinations(data, 2) gives us all unique pairs.
    for id1, id2 in combinations(data, 2):
        
        # zip(id1, id2) pairs up characters: ('a', 'a'), ('b', 'x'), etc.
        # We count how many pairs are different.
        # Boolean True is treated as 1, False as 0.
        diff_count = sum(posn1 != posn2 for posn1, posn2 in zip(id1, id2))
        
        if diff_count == 1:
            # We've found the correct pair!
            # Return the common characters.
            return ''.join(posn1 for posn1, posn2 in zip(id1, id2) if posn1 == posn2)

    return None

This is elegant, but let’s unpack how it works, as it combines a few powerful Python features.

Understanding zip()

The zip() function takes two (or more) iterables and “zips” them together, creating an iterator of tuples. Each tuple contains the $i$-th element from each of the input iterables.

For example:

id1 = "abc"
id2 = "axc"
list(zip(id1, id2))
# Result: [('a', 'a'), ('b', 'x'), ('c', 'c')]

Understanding sum() with a Generator

The line sum(posn1 != posn2 for posn1, posn2 in zip(id1, id2)) does a lot of heavy lifting:

  1. Generator Expression: The part inside the parentheses (posn1 != posn2 for ...) is a generator expression. It iterates through the tuples created by zip.
  2. Boolean Evaluation: For each pair, it evaluates posn1 != posn2. This returns True if the characters are different, and False if they are the same.
    • (‘a’, ‘a’) -> False
    • (‘b’, ‘x’) -> True
    • (‘c’, ‘c’) -> False
  3. Summing Booleans: Python treats True as 1 and False as 0 in numeric contexts. So sum() effectively adds up the 1s (the differences).
    • sum([False, True, False]) is equivalent to 0 + 1 + 0 = 1.

This gives us the total count of differing characters in a single, readable line.

However, this approach has a minor inefficiency: it checks every character in the pair, even if we’ve already found a difference.

Approach 2: Optimization with Early Break

Since we are only interested in pairs with exactly one difference, we can stop checking a specific pair as soon as we find a second difference. This is what I implemented in my final solution:

def part2(data):
    """ Find the two IDs that differ by exactly one character at the same position in both strings. """
    
    # Compare every pair of IDs
    for id1, id2 in combinations(data, 2):
        diffs = 0
        for posn1, posn2 in zip(id1, id2):
            if posn1 != posn2:
                diffs += 1
            if diffs > 1: # No point continuing if we've already exceeded the limit
                break

        if diffs == 1:
            # We've found two IDs that differ by exactly one character at the same position in both strings.
            # Return the common characters of the two IDs.
            return ''.join(posn1 for posn1, posn2 in zip(id1, id2) if posn1 == posn2)

    return False 

In this version, break exits the inner loop immediately when diffs > 1. For long strings or many differences, this saves unnecessary comparisons. While the strings in this specific puzzle are short, this “short-circuiting” pattern is a good habit for performance-critical code.

Results

Part 1 soln=5750
Part 2 soln=tzyvqujloriafkmgswcxbhnpd