Learning Python with Advent of Code Walkthroughs

Dazbo's Advent of Code solutions, written in Python

Advent of Code 2015 - Day 8

Day 8: Matchsticks

Useful Links

Concepts and Packages Demonstrated

literal_evalregex

Problem Intro

Quite a nice easy challenge today. We’re told that Santa has a digital copy of his list. The string values in the list contain quotes, escape characters, and hex representations of characters.

The sample data looks like this:

""
"abc"
"aaa\"aaa"
"\x27"

So, we can count the number of characters in the raw input strings, and we can count the number of characters in the string being represented:

Value Length of Raw (Input) String Represented String Length of Represented String
”” 2 [empty] 0
“abc” 5 abc 3
“aaa\"aaa” 10 aaa”aaa 7
“\x27” 6 1

Part 1

Disregarding the whitespace in the file, what is the number of characters of the raw string literals minus the number of characters of the represented strings, in total for the entire file?

We could use some regex to rip out the outside quotes, and to replace escape characters with their representations. But it’s actually much easier to use the ast.literal_eval() function. This is a Python standard module which provides the ability to parse and evaluate any input that follows the rules of Python’s standard data types. E.g. we can use literal_eval() to evalute strings, lists, dicts, etc.

The code is pretty simple:

from pathlib import Path
import time
import re
from ast import literal_eval

SCRIPT_DIR = Path(__file__).parent 
INPUT_FILE = Path(SCRIPT_DIR, "input/input.txt")
# INPUT_FILE = Path(SCRIPT_DIR, "input/sample_input.txt")

def main():
    with open(INPUT_FILE, mode="rt") as f:
        data = f.read().splitlines()

    raw_lengths = []
    evaluated_lengths = []
    for line in data:
        line = line.strip()
        raw_lengths.append(len(line))
        
        # Use literal_eval to take the raw string, and evaluate it as a Python expression
        str_repr = literal_eval(line)

        # Part 1
        evaluated_lengths.append(len(str_repr))

    sum_raw_lengths = sum(raw_lengths)
    sum_eval_lengths = sum(evaluated_lengths)
    print(f"Sum of raw strings: {sum_raw_lengths}") 
    print(f"Sum of evaluated strings: {sum_eval_lengths}") 
    print(f"Diff: {sum_raw_lengths-sum_eval_lengths}") 

if __name__ == "__main__":
    t1 = time.perf_counter()
    main()
    t2 = time.perf_counter()
    print(f"Execution time: {t2 - t1:0.4f} seconds")

Part 2

We’re told we now need to go in reverse. I.e. we treat each line in the input data as a string representation , and we need to encode the representation to its required raw form. Thus, we need to encode any non-alphanumeric characters. For example, imagine we have a string representation that needs to be encoded:

""

I.e. two double-quotes. We would need to encode each double-quote by replacing " with \". So, our final raw string would be:

"\"\""

I.e. one \" for each " to be represented, plus a pair of " to wrap the whole string.

If we do this for our original four sample strings, we find they all grow as follows:

Value Length of Represented (Input) String Encoded Raw String Length of Encoded String
"" 2 """" 6
"abc" 5 "\"abc\"" 9
"aaa\"aaa" 10 "\"aaa\\\"aaa\"" 16
"\x27” 6 "\"\\x27\"" 11

We’re asked to find the total number of characters to represent the newly encoded strings minus the number of characters of code in each original string literal.

This requires very little additional code to accomplish. All the hard work can be done with this line:

encoded_str = r'"' + re.sub(r'(["\\])', r'\\\1', line) + r'"'

This is regex which performs a replace. We’re using the sub() method to find any occurrence of “"” or “\”, and then prefix them with an extra “\”. Note how \1 is used to mean whatever we matched.

Doing this replacement returns our new encoded version of the string. Now all we need to do is store the lengths of the encoded strings, just as we did with the raw and represented strings.

So the final solution looks like this:

from pathlib import Path
import time
import re
from ast import literal_eval

SCRIPT_DIR = Path(__file__).parent 
INPUT_FILE = Path(SCRIPT_DIR, "input/input.txt")
# INPUT_FILE = Path(SCRIPT_DIR, "input/sample_input.txt")

def main():
    with open(INPUT_FILE, mode="rt") as f:
        data = f.read().splitlines()

    raw_lengths = []
    evaluated_lengths = []
    encoded_lengths = []
    for line in data:
        line = line.strip()
        raw_lengths.append(len(line))
        
        # Use literal_eval to take the raw string, and evaluate it as a Python expression
        str_repr = literal_eval(line)

        # Part 1
        evaluated_lengths.append(len(str_repr))

        # Part 2
        # replace any " with \" and any \ with \\
        # we do this by just inserting a \ before each matching \1.
        encoded_str = r'"' + re.sub(r'(["\\])', r'\\\1', line) + r'"'
        encoded_lengths.append(len(encoded_str))

    sum_raw_lengths = sum(raw_lengths)
    sum_eval_lengths = sum(evaluated_lengths)
    sum_encoded_lengths = sum(encoded_lengths)
    print(f"Sum of raw strings: {sum_raw_lengths}") 
    print(f"Sum of evaluated strings: {sum_eval_lengths}") 
    print(f"Diff: {sum_raw_lengths-sum_eval_lengths}") 

    print(f"Sum of encoded strings: {sum_encoded_lengths}") 
    print(f"Diff: {sum_encoded_lengths-sum_raw_lengths}") 

if __name__ == "__main__":
    t1 = time.perf_counter()
    main()
    t2 = time.perf_counter()
    print(f"Execution time: {t2 - t1:0.4f} seconds")

This runs pretty darn fast. Here’s the output with my actual data:

Sum of raw strings: 6195
Sum of evaluated strings: 4845
Diff: 1350
Sum of encoded strings: 8280
Diff: 2085
Execution time: 0.0059 seconds