Dazbo's Advent of Code solutions, written in Python
ParsimoniousList ComprehensionUser-defined exceptions
I enjoyed this one. There are a couple of ways of going about the problem, and I ended up writing three different solutions.
We’re told that our sub’s navigation data is corrupted. The navigation is our input data, and it looks like this:
[({(<(())[]>[[{[]{<()<>>
[(()[<>])]({[<{<<[]>>(
{([(<{}[<>[]}>{[]{[(<()>
(((({<>}<{<{<>}{[]{[]{}
[[<[([]))<([[{}[[()]]]
...
We’re told some lines are incomplete, whilst some are corrupted.
"{()()()>"
must be corrupt, because we have a >
without a corresponding <
."{<<[]>>("
must be incomplete, because the last (
has not been closed, nor has the {
.My first solution uses the very cool Parsimonious. This is a library that allows the parsing of grammars, i.e. a set of terms that follow a defined set of rules. Think of it like regex on steroids, since it has the ability to recognise specific language terms, but also recurse into those terms.
We’re ignoring incomplete lines. We’re told to find the first invalid chracter in each corrupted line, and determine its score. Each type of invalid character maps to one of four unique scores. Then we’re asked to add up these scores, to give the total syntax error score.
Let’s import what we need:
from collections import defaultdict
import logging
import os
import time
from parsimonious import Grammar, ParseError
from parsimonious.exceptions import IncompleteParseError
SCRIPT_DIR = os.path.dirname(__file__)
INPUT_FILE = "input/input.txt"
# INPUT_FILE = "input/sample_input.txt"
logging.basicConfig(format="%(asctime)s.%(msecs)03d:%(levelname)s:%(name)s:\t%(message)s",
datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)
Let’s define our Parsimonious Grammar
, i.e. the set of rules that determine whether our input is valid.
# Each row (expr) must contain one or more chunks (any)
# Each chunk (any) can be of type 'normal', 'angle', 'square', or 'curly'
# Each type is composed of a bracket pair, and 0 or more inner chunks (any)
grammar = Grammar(r"""
expr = any+
any = (normal / angle / square / curly)
normal = "(" any* ")"
angle = "<" any* ">"
square = "[" any* "]"
curly = "{" any* "}"
""")
Explaining the rules:
expr
, which is what we’ll validate each input line against. This requires 1 or more instances of any
. The +
operator means “one or more”, just like in regex.any
as one item of normal
, angle
, square
, or curly
. The /
means “or”.normal
, angle
, square
, and curly
to be a specific opening bracket, followed by zero or more any
, followed by the matching closing bracket. As with regex, *
means 0, 1 or more. Note that this rule allows our grammar to be recursive, i.e. because our ‘top’ any can contain terms which can themselves contain ‘any’.Now we create a list
of opening brackets, and a list
of corresponding closing brackets. Then we zip these two lists together, to create two dicts:
And finally, one more dictionary, which maps each closing bracket to the arbitrary invalid char scores we were told to use.
OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS)) # {'(': ')', ...}
CLOSE_TO_OPEN = dict(zip(CLOSERS, OPENERS)) # {')': '(', ...}
INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137))) # {')': 3, ...}
Now we run it:
input_file = os.path.join(SCRIPT_DIR, INPUT_FILE)
with open(input_file, mode="rt") as f:
data = f.read().splitlines()
# Part 1 - Looking for corrupted lines only
incomplete_lines = []
invalid_chars = []
for line_num, line in enumerate(data):
try:
grammar.parse(line)
except IncompleteParseError: # valid, but incomplete
incomplete_lines.append(line)
except ParseError as err:
if err.pos == len(line): # valid, but incomplete
incomplete_lines.append(line)
else: # corrupted
logger.debug("%d: %s", line_num, line)
logger.debug("Found %s", line[err.pos])
invalid_chars.append(line[err.pos])
logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)
First, we read all the data and split into separate lines. We parse each line, using our grammar. We’re only interested in lines which fail to validate, and these are identified as follows:
IncompleteParseError
. We’ll keep these for later!ParseError
. We can identify these because by the fact that we’ve got to the end of the line before the ParseError
was thrown. We’ll also keep these for later.ParseError
, but won’t have reached the end of the line. This is all we need for Part 1. We store the character that was read when the ParseError was thrown, since this is the character we need. Finally, we map each of these characters to its corresponding score, and then add up the scores.Easy!
Unsurprisingly, we’re now asked to work with the incomplete lines. Specifically, we’re told we need to complete the incomplete lines, by adding the necessary sequence of closing brackets. Work out the completion score.
We’ll do most of the hard work here:
def get_completion_for_line(line: str) -> str:
""" Determine which closing brackets need to be added to complete this incomplete line. """
to_complete = ""
close_counters = defaultdict(int)
for char in line[::-1]: # process chars from the end
if char in CLOSERS:
close_counters[char] += 1
else: # opener
matching_closer = OPEN_TO_CLOSE[char]
if close_counters[matching_closer] > 0: # opener for existing closer
close_counters[matching_closer] -= 1
else: # opener, but missing closer
to_complete += matching_closer
return to_complete
The objective is to take each incomplete line, and then work out which brackets haven’t been closed.
This is what the function does:
defaultdict(int)
. We’ve seen the defaultdict
before in this AoC. This is a dict where each value is initialised to 0, meaning we can add to the value of a key, even if the key hasn’t been used before.[::-1]
syntax. (This means step from the end to the beginning, but in steps of -1.)to_complete
variable.Now we need to determine the score that results from the set of closing brackets we’ve identified.
def score_for_completion(completion_str: str) -> int:
""" Arbitrary rules for calculating a score for a str of completion chars """
score = 0
for char in completion_str:
score *= 5
score += COMPLETION_CHAR_SCORES[char]
return score
Finally, let’s run all this, to solve Part 2.
# Part 2
logger.info("Part 2: There are %d remaining incomplete lines", len(incomplete_lines))
completion_scores = []
for line in incomplete_lines:
to_complete = get_completion_for_line(line)
completion_scores.append(score := score_for_completion(to_complete))
logger.debug("To complete: %s with score %d", to_complete, score)
completion_scores.sort()
logger.info("Completion score=%d", completion_scores[len(completion_scores)//2])
We sort the scores, and then identify the median, as required.
The output looks like this:
2022-01-15 20:12:37.680:INFO:__main__: Part 1: There are 49 corrupted lines
2022-01-15 20:12:37.681:INFO:__main__: Syntax error score=339477
2022-01-15 20:12:37.682:INFO:__main__: Part 2: There are 49 remaining incomplete lines
2022-01-15 20:12:37.686:INFO:__main__: Completion score=3049320156
2022-01-15 20:12:37.687:INFO:__main__: Execution time: 0.0664 seconds
Parsimonious is awesome, and we can give it any rules we want it to follow. But it does come with a bit of processing overhead. If we want to be super-quick, we can write our own parser. So that’s what I’ve done here. It also gives us a bit of practice at writing our own user-defined exceptions.
First, Part 1:
These are the imports we need:
import logging
import os
import time
from collections import deque
from dataclasses import dataclass
And we’ll still use these constants:
OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS)) # {'(': ')', ...}
COMPLETION_CHAR_SCORES = dict(zip(CLOSERS, (1, 2, 3, 4)))
INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137)))
And we’ll create a user-defined ParseException
, by subclassing Exception
. Note that we also make it a dataclass
, so we don’t need to create an __init__()
method to initialise any variables.
@dataclass
class ParseException(Exception):
""" Data parsed and found to be invalid """
expected: str
actual: str
Now, for each line, we parse using this function:
def parse(line: str):
""" Parse the navigation instructions, line-by-line.
If we read any right bracket that does not match an existing left bracket, raise ParseException.
"""
stack = deque()
for char in line:
if char in OPENERS:
stack.append(OPEN_TO_CLOSE[char])
continue # Move on to the next char
assert char in CLOSERS, "Must be right bracket"
popped = stack.pop() # Pop the required right bracket
if char == popped:
continue # This was the right bracket we needed
raise ParseException(expected=popped, actual=char) # Wrong right bracket - invalid line
It works by:
ParseException
, containing the invalid chaacter we just read.Solving Part 1 is now just this:
# Part 1 - Looking for corrupted lines only
invalid_chars = []
for line in data:
try:
parse(line)
except ParseException as pe:
invalid_chars.append(pe.actual)
logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)
For Part 2, we just make some small tweaks.
First, we add a new user-defined exception:
@dataclass
class ParseIncompleteException(Exception):
""" Data parsed and found to be incomplete """
remaining: str
Then we need to modify our parse()
method:
def parse(line: str):
""" Parse the navigation instructions, line-by-line.
If we read any right bracket that does not match an existing left bracket, raise ParseException.
If we read all the data but still have closing brackets left on the stack, raise ParseIncompleteException.
"""
stack = deque()
for char in line:
if char in OPENERS:
stack.append(OPEN_TO_CLOSE[char])
continue # Move on to the next char
assert char in CLOSERS, "Must be right bracket"
popped = stack.pop() # Pop the required right bracket
if char == popped:
continue # This was the right bracket we needed
raise ParseException(expected=popped, actual=char) # Wrong right bracket - invalid line
if stack: # There were more left brackets than right, so this line is incomplete
raise ParseIncompleteException(remaining="".join(reversed(stack)))
The only change here is the check at the end. If there are any closing brackets still on the stack, then we haven’t yet found these required brackets. So, throw a ParseIncompleteException
, and pass in the closing brackets remaining on the stack, in reverse order. These are the brackets that are needed to complete our line!
And now let’s update our main method accordingly:
invalid_chars = []
completion_scores = [] # We've added this
for line in data:
try:
parse(line)
except ParseException as pe:
invalid_chars.append(pe.actual)
except ParseIncompleteException as pie: # We've added this new exception type
completion_scores.append(score_for_completion(pie.remaining))
logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)
# Part 2
completion_scores.sort()
logger.info("Part 2: Completion score=%d", completion_scores[len(completion_scores)//2])
And this is about 300x faster:
2022-01-15 20:35:23.126:INFO:__main__: Part 1: There are 49 corrupted lines
2022-01-15 20:35:23.127:INFO:__main__: Syntax error score=339477
2022-01-15 20:35:23.127:INFO:__main__: Part 2: Completion score=3049320156
2022-01-15 20:35:23.128:INFO:__main__: Execution time: 0.0021 seconds
This approach relies on stripping out complete bracket pairs from the input lines, until all that remains is either invalid characters, or brackets that need to be completed.
Slight tweaks to our constants:
OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
PAIRS = ["".join(item) for item in zip(OPENERS, CLOSERS)] # ['()', '[]', ...]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS)) # {'(': ')', ...}
INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137)))
COMPLETION_CHAR_SCORES = dict(zip(CLOSERS, (1, 2, 3, 4)))
To trim out valid brackets:
def trim_brackets(line: str) -> str:
""" Iteratively strip out contiguous pairs of brackets from this line, until none remain """
stripped = True
while stripped: # Continue until no more stripping
stripped = False
for bracket_pair in PAIRS: # E.g. "()"
if bracket_pair in line:
line = line.replace(bracket_pair, "") # Strip out all occurences of this pair
stripped = True
return line
This is what it does with an input line:
'[({(<(())[]>[[{[]{<()<>>' # remove ()
'[({(<()[]>[[{[]{<<>>' # remove []
'[({(<()>[[{{<<>>' # remove <>
'[({(<()>[[{{<>' # remove ()
'[({(<>[[{{<>' # remove <>
'[({([[{{' # no pairs left to remove - valid but incomplete
Our get_completion_for_line()
function is now really simple. We take the incomplete (but valid) lines, and simply add the corresponding closer for each opener, starting at the end and working back to the front.
def get_completion_for_line(line: str) -> str:
""" Determine which closing brackets need to be added to complete this incomplete line. """
to_complete = ""
for opener in line[::-1]:
to_complete += OPEN_TO_CLOSE[opener]
return to_complete
So we solve both parts, thus:
# Part 1 - Looking for corrupted lines only
incomplete_lines = []
invalid_chars = []
trimmed_lines = [trim_brackets(line) for line in data]
for line in trimmed_lines:
# corrupt lines will have closing brackets, whilst incomplete lines will not
if any(char in CLOSERS for char in line): # corrupt if any closer in the line
for char in line:
if char in CLOSERS:
invalid_chars.append(char) # find first closer
break
else:
incomplete_lines.append(line)
logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)
# Part 2
logger.info("Part 2: There are %d remaining incomplete lines", len(incomplete_lines))
completion_scores = []
for line in incomplete_lines:
to_complete = get_completion_for_line(line)
completion_scores.append(score := score_for_completion(to_complete))
logger.debug("To complete: %s with score %d", to_complete, score)
completion_scores.sort()
logger.info("Completion score=%d", completion_scores[len(completion_scores)//2])
Output:
21:04:20.963:INFO:__main__: Part 1: There are 49 corrupted lines
21:04:20.965:INFO:__main__: Syntax error score=339477
21:04:20.965:INFO:__main__: Part 2: There are 49 remaining incomplete lines
21:04:20.968:INFO:__main__: Completion score=3049320156
21:04:20.968:INFO:__main__: Execution time: 0.0023 seconds
Also pretty quick.