Learning Python with Advent of Code Walkthroughs

Dazbo's Advent of Code solutions, written in Python

Data Compression Machine

Advent of Code 2016 - Day 9

Day 9: Explosives in Cyberspace

Useful Links

Concepts and Packages Demonstrated

RecursionRegular Expressions

Problem Intro

This puzzle is all about data compression. We’re given a compressed file format that uses markers to indicate repeated sequences of characters. A marker is of the form (AxB), where A is the number of characters to repeat, and B is the number of times to repeat them.

For example:

Part 1

What is the decompressed length of the file (your puzzle input)?

For Part 1, the rules are simple: markers within a repeated section are not expanded. This makes the problem relatively straightforward. We can iterate through the string, and when we encounter a marker, we expand the subsequent section and append it to our result.

My initial approach was to build the entire decompressed string and then find its length. This works perfectly for Part 1.

Solution Code - Part 1 (Brute-Force)

import logging
import os
import time
import re

SCRIPT_DIR = os.path.dirname(__file__)
INPUT_FILE = "input/input.txt"

expand_pattern = re.compile(r"\((\d+)x(\d+)\)")

def main():
    logging.basicConfig(level=logging.DEBUG, format="%(asctime)s:%(levelname)s:\t%(message)s")

    input_file = os.path.join(SCRIPT_DIR, INPUT_FILE)
    with open(input_file, mode="rt") as f:
        src_str = f.read()

    # Part 1
    char_posn = 0
    expanded_str = ""
    while char_posn < len(src_str):
        match = expand_pattern.search(src_str, char_posn)
        if match:
            match_start = match.start()
            match_end = match.end()
            extent, repeat = [int(x) for x in match.groups()]

            expanded_str += src_str[char_posn:match_start]
            for _ in range(repeat):
                expanded_str += src_str[match_end:match_end+extent]

            char_posn = match_end+extent
        else:
            expanded_str += src_str[char_posn:]
            char_posn = len(src_str)

    logging.info(f"Part 1: Expanded str length = {len(expanded_str)}")

if __name__ == "__main__":
    t1 = time.perf_counter()
    main()
    t2 = time.perf_counter()
    print(f"Execution time: {t2 - t1:0.4f} seconds")

This approach is simple to understand but has a major drawback: it can consume a lot of memory and time if the decompressed string is very large. For Part 1, it’s fine, but for Part 2, this will be a problem.

Part 2

What is the decompressed length of the file using this improved format?

In Part 2, the rules change: markers within repeated sections are also expanded. This means we need a recursive approach. Building the full string is no longer feasible, as the length can become astronomically large.

Instead of building the string, we can write a recursive function that calculates the length of the decompressed string.

Solution Code - Part 2 (Recursive)

This solution is much more efficient and elegant.

import logging
import os
import time
import re

SCRIPT_DIR = os.path.dirname(__file__)
INPUT_FILE = "input/input.txt"

expand_pattern = re.compile(r"\((\d+)x(\d+)\)")

def main():
    logging.basicConfig(level=logging.DEBUG, format="%(asctime)s:%(levelname)s:\t%(message)s")

    input_file = os.path.join(SCRIPT_DIR, INPUT_FILE)
    with open(input_file, mode="rt") as f:
        src_str = f.read()

    # Part 1 - Using the recursive function for consistency
    result = decompressed_len(src_str=src_str)
    logging.info(f"Part 1: Expanded str length = {result}")

    # Part 2 - Recurses into each segment
    result = recursive_len(src_str=src_str)
    logging.info(f"Part 2: Expanded str length = {result}")

def recursive_len(src_str: str) -> int:
    return decompressed_len(src_str=src_str, len_func=recursive_len)

def decompressed_len(src_str: str, len_func=len) -> int:
    if not src_str:
        return 0

    match = expand_pattern.match(src_str)
    if match:
        extent, repeat = map(int, match.groups())

        start = match.end()
        end = match.end() + extent

        return (repeat * len_func(src_str[start:end])
                + decompressed_len(src_str[end:], len_func))

    return 1 + decompressed_len(src_str[1:], len_func)

if __name__ == "__main__":
    t1 = time.perf_counter()
    main()
    t2 = time.perf_counter()
    print(f"Execution time: {t2 - t1:0.4f} seconds")

Explanation

The beauty of this solution lies in the decompressed_len function. It takes an optional len_func argument, which defaults to the standard len() function.

This recursive approach is incredibly powerful and efficient, allowing us to calculate the massive length in Part 2 almost instantly.

Results

Part 1: Expanded str length = 102239
Part 2: Expanded str length = 10774309173
Execution time: 0.0025 seconds