Advent of Code 2015 - Day 8

Useful Links

Concepts and Packages Demonstrated

Problem Intro

Quite a nice easy challenge today. We’re told that Santa has a digital copy of his list. The string values in the list contain quotes, escape characters, and hex representations of characters.

The sample data looks like this:

""
"abc"
"aaa\"aaa"
"\x27"

So, we can count the number of characters in the raw input strings, and we can count the number of characters in the string being represented:

Value	Length of Raw (Input) String	Represented String	Length of Represented String
””	2	[empty]	0
“abc”	5	abc	3
“aaa\"aaa”	10	aaa”aaa	7
“\x27”	6	’	1

Part 1

Disregarding the whitespace in the file, what is the number of characters of the raw string literals minus the number of characters of the represented strings, in total for the entire file?

We could use some regex to rip out the outside quotes, and to replace escape characters with their representations. But it’s actually much easier to use the ast.literal_eval() function. This is a Python standard module which provides the ability to parse and evaluate any input that follows the rules of Python’s standard data types. E.g. we can use literal_eval() to evalute strings, lists, dicts, etc.

The code is pretty simple:

from pathlib import Path
import time
import re
from ast import literal_eval

SCRIPT_DIR = Path(__file__).parent 
INPUT_FILE = Path(SCRIPT_DIR, "input/input.txt")
# INPUT_FILE = Path(SCRIPT_DIR, "input/sample_input.txt")

def main():
    with open(INPUT_FILE, mode="rt") as f:
        data = f.read().splitlines()

    raw_lengths = []
    evaluated_lengths = []
    for line in data:
        line = line.strip()
        raw_lengths.append(len(line))
        
        # Use literal_eval to take the raw string, and evaluate it as a Python expression
        str_repr = literal_eval(line)

        # Part 1
        evaluated_lengths.append(len(str_repr))

    sum_raw_lengths = sum(raw_lengths)
    sum_eval_lengths = sum(evaluated_lengths)
    print(f"Sum of raw strings: {sum_raw_lengths}") 
    print(f"Sum of evaluated strings: {sum_eval_lengths}") 
    print(f"Diff: {sum_raw_lengths-sum_eval_lengths}") 

if __name__ == "__main__":
    t1 = time.perf_counter()
    main()
    t2 = time.perf_counter()
    print(f"Execution time: {t2 - t1:0.4f} seconds")

First, we read in the data, and store as a list of str lines, as usual.
We then create two new lists to store raw_lengths, and evaluated_lengths. I.e. the lengths of the original raw strings, and then lengths of the strings the raw string is intended to represent.
For each line, we pass the raw str through ast.literal_eval(). This is a handy function that takes any expression, and interprets the expression as if it were a native Python object, such as a str, list, dict; it then returns that Python native object. So, if we give literal_eval() raw data that resembles a Python str, then the function will parse it and return the actual str it represents. In short: it does everything we need for this challenge!
Once we’ve evaluated our raw string to get our representation, we determine the length of the representation, and add this length to our evaluated_lengths list.
Finally, we sum up all the lengths in raw_lengths and sum up all the lengths in evaluated_lengths, and return the difference, as required.

Part 2

We’re told we now need to go in reverse. I.e. we treat each line in the input data as a string representation , and we need to encode the representation to its required raw form. Thus, we need to encode any non-alphanumeric characters. For example, imagine we have a string representation that needs to be encoded:

""

I.e. two double-quotes. We would need to encode each double-quote by replacing " with \". So, our final raw string would be:

"\"\""

I.e. one \" for each " to be represented, plus a pair of " to wrap the whole string.

If we do this for our original four sample strings, we find they all grow as follows:

Value	Length of Represented (Input) String	Encoded Raw String	Length of Encoded String
""	2	""""	6
"abc"	5	"\"abc\""	9
"aaa\"aaa"	10	"\"aaa\\\"aaa\""	16
"\x27”	6	"\"\\x27\""	11

We’re asked to find the total number of characters to represent the newly encoded strings minus the number of characters of code in each original string literal.

This requires very little additional code to accomplish. All the hard work can be done with this line:

encoded_str = r'"' + re.sub(r'(["\\])', r'\\\1', line) + r'"'

This is regex which performs a replace. We’re using the sub() method to find any occurrence of “"” or “\”, and then prefix them with an extra “\”. Note how \1 is used to mean whatever we matched.

Doing this replacement returns our new encoded version of the string. Now all we need to do is store the lengths of the encoded strings, just as we did with the raw and represented strings.

So the final solution looks like this:

from pathlib import Path
import time
import re
from ast import literal_eval

SCRIPT_DIR = Path(__file__).parent 
INPUT_FILE = Path(SCRIPT_DIR, "input/input.txt")
# INPUT_FILE = Path(SCRIPT_DIR, "input/sample_input.txt")

def main():
    with open(INPUT_FILE, mode="rt") as f:
        data = f.read().splitlines()

    raw_lengths = []
    evaluated_lengths = []
    encoded_lengths = []
    for line in data:
        line = line.strip()
        raw_lengths.append(len(line))
        
        # Use literal_eval to take the raw string, and evaluate it as a Python expression
        str_repr = literal_eval(line)

        # Part 1
        evaluated_lengths.append(len(str_repr))

        # Part 2
        # replace any " with \" and any \ with \\
        # we do this by just inserting a \ before each matching \1.
        encoded_str = r'"' + re.sub(r'(["\\])', r'\\\1', line) + r'"'
        encoded_lengths.append(len(encoded_str))

    sum_raw_lengths = sum(raw_lengths)
    sum_eval_lengths = sum(evaluated_lengths)
    sum_encoded_lengths = sum(encoded_lengths)
    print(f"Sum of raw strings: {sum_raw_lengths}") 
    print(f"Sum of evaluated strings: {sum_eval_lengths}") 
    print(f"Diff: {sum_raw_lengths-sum_eval_lengths}") 

    print(f"Sum of encoded strings: {sum_encoded_lengths}") 
    print(f"Diff: {sum_encoded_lengths-sum_raw_lengths}") 

if __name__ == "__main__":
    t1 = time.perf_counter()
    main()
    t2 = time.perf_counter()
    print(f"Execution time: {t2 - t1:0.4f} seconds")

This runs pretty darn fast. Here’s the output with my actual data:

Sum of raw strings: 6195
Sum of evaluated strings: 4845
Diff: 1350
Sum of encoded strings: 8280
Diff: 2085
Execution time: 0.0059 seconds

‹›