Advent of Code 2021 - Day 10

Useful Links

Concepts and Packages Demonstrated

Parsimonious List Comprehension User-defined exceptions

defaultdictreversing str

Problem Intro

I enjoyed this one. There are a couple of ways of going about the problem, and I ended up writing three different solutions.

Solution #1 - Using the Parsimonious PEG Library
Solution #2 - Writing my own parser
Solution #3 - Stripping out brackets

We’re told that our sub’s navigation data is corrupted. The navigation is our input data, and it looks like this:

[({(<(())[]>[[{[]{<()<>>
[(()[<>])]({[<{<<[]>>(
{([(<{}[<>[]}>{[]{[(<()>
(((({<>}<{<{<>}{[]{[]{}
[[<[([]))<([[{}[[()]]]
...

Each line contains one or more navigation chunks.
Chunks open and close with matching brackets.
Valid brackets are any of (), <>, {}, and [].
Each chunk contains zero or more inner chunks.

We’re told some lines are incomplete, whilst some are corrupted.

Corrupt lines will have a closing bracket in the wrong place. I.e. because a different closing bracket is expected. For example, "{()()()>" must be corrupt, because we have a > without a corresponding <.
Incomplete lines do not have any closing brackets in the wrong place. But there are chunks still open, and thus missing appropriate closing brackets. For example, "{<<[]>>(" must be incomplete, because the last ( has not been closed, nor has the {.

Solution #1

My first solution uses the very cool Parsimonious. This is a library that allows the parsing of grammars, i.e. a set of terms that follow a defined set of rules. Think of it like regex on steroids, since it has the ability to recognise specific language terms, but also recurse into those terms.

Part 1

We’re ignoring incomplete lines. We’re told to find the first invalid chracter in each corrupted line, and determine its score. Each type of invalid character maps to one of four unique scores. Then we’re asked to add up these scores, to give the total syntax error score.

Setup

Let’s import what we need:

from collections import defaultdict
import logging
import os
import time
from parsimonious import Grammar, ParseError
from parsimonious.exceptions import IncompleteParseError

SCRIPT_DIR = os.path.dirname(__file__) 
INPUT_FILE = "input/input.txt"
# INPUT_FILE = "input/sample_input.txt"

logging.basicConfig(format="%(asctime)s.%(msecs)03d:%(levelname)s:%(name)s:\t%(message)s", 
                    datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)

The solution

Let’s define our Parsimonious Grammar, i.e. the set of rules that determine whether our input is valid.

# Each row (expr) must contain one or more chunks (any)
# Each chunk (any) can be of type 'normal', 'angle', 'square', or 'curly'
# Each type is composed of a bracket pair, and 0 or more inner chunks (any)
grammar = Grammar(r"""
    expr = any+
    any = (normal / angle / square / curly)
    normal = "(" any* ")"
    angle = "<" any* ">"
    square = "[" any* "]"
    curly = "{" any* "}"
""")

Explaining the rules:

Start by defining expr, which is what we’ll validate each input line against. This requires 1 or more instances of any. The + operator means “one or more”, just like in regex.
We define any as one item of normal, angle, square, or curly. The / means “or”.
We define each of normal, angle, square, and curly to be a specific opening bracket, followed by zero or more any, followed by the matching closing bracket. As with regex, * means 0, 1 or more. Note that this rule allows our grammar to be recursive, i.e. because our ‘top’ any can contain terms which can themselves contain ‘any’.

Now we create a list of opening brackets, and a list of corresponding closing brackets. Then we zip these two lists together, to create two dicts:

One which maps each opener to a matching closer.
Another which maps each closer to a matching opener.

And finally, one more dictionary, which maps each closing bracket to the arbitrary invalid char scores we were told to use.

OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS))  # {'(': ')', ...}
CLOSE_TO_OPEN = dict(zip(CLOSERS, OPENERS))  # {')': '(', ...}

INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137))) # {')': 3, ...}

Now we run it:

input_file = os.path.join(SCRIPT_DIR, INPUT_FILE)
with open(input_file, mode="rt") as f:
    data = f.read().splitlines()

# Part 1 - Looking for corrupted lines only
incomplete_lines = []
invalid_chars = []
for line_num, line in enumerate(data):
    try:
        grammar.parse(line)
    except IncompleteParseError:    # valid, but incomplete
        incomplete_lines.append(line)
    except ParseError as err:
        if err.pos == len(line):    # valid, but incomplete
            incomplete_lines.append(line)
        else:                       # corrupted
            logger.debug("%d: %s", line_num, line)
            logger.debug("Found %s", line[err.pos])
            invalid_chars.append(line[err.pos])

logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))                
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)

First, we read all the data and split into separate lines. We parse each line, using our grammar. We’re only interested in lines which fail to validate, and these are identified as follows:

Incomplete lines - with some rules passed - throw IncompleteParseError. We’ll keep these for later!
Incomplete lines - with no rules passed - throw ParseError. We can identify these because by the fact that we’ve got to the end of the line before the ParseError was thrown. We’ll also keep these for later.
Corrupt lines also throw a ParseError, but won’t have reached the end of the line. This is all we need for Part 1. We store the character that was read when the ParseError was thrown, since this is the character we need. Finally, we map each of these characters to its corresponding score, and then add up the scores.

Easy!

Part 2

Unsurprisingly, we’re now asked to work with the incomplete lines. Specifically, we’re told we need to complete the incomplete lines, by adding the necessary sequence of closing brackets. Work out the completion score.

We’ll do most of the hard work here:

def get_completion_for_line(line: str) -> str:
    """ Determine which closing brackets need to be added to complete this incomplete line. """
    
    to_complete = ""
    close_counters = defaultdict(int)
    for char in line[::-1]:     # process chars from the end
        if char in CLOSERS:
            close_counters[char] += 1
        else:  # opener
            matching_closer = OPEN_TO_CLOSE[char]
            if close_counters[matching_closer] > 0:    # opener for existing closer
                close_counters[matching_closer] -= 1
            else:    # opener, but missing closer
                to_complete += matching_closer
                
    return to_complete

The objective is to take each incomplete line, and then work out which brackets haven’t been closed.

This is what the function does:

Create a defaultdict(int). We’ve seen the defaultdict before in this AoC. This is a dict where each value is initialised to 0, meaning we can add to the value of a key, even if the key hasn’t been used before.
Then we take our incomplete line, and reverse it, using the [::-1] syntax. (This means step from the end to the beginning, but in steps of -1.)
For each char in the reversed line:
- Determine if the char is a closing bracket.
  - If so, store a count of how many of this type of closing brackets we’ve come across. (The key is the bracket type.)
  - If not, then we have an opening bracket. If the opening bracket matches a closing bracket in the dict:
    - We’ve matched it, and we decrement the count for that closing bracket.
    - If not, then we’ve found an opening bracket that needs a closing bracket. So, add the appropriate closing bracket to our to_complete variable.

Now we need to determine the score that results from the set of closing brackets we’ve identified.

def score_for_completion(completion_str: str) -> int:
    """ Arbitrary rules for calculating a score for a str of completion chars """
    score = 0
    
    for char in completion_str:
        score *= 5
        score += COMPLETION_CHAR_SCORES[char]
        
    return score    

Finally, let’s run all this, to solve Part 2.

# Part 2  
logger.info("Part 2: There are %d remaining incomplete lines", len(incomplete_lines))
completion_scores = []
for line in incomplete_lines:
    to_complete = get_completion_for_line(line)
    completion_scores.append(score := score_for_completion(to_complete))
    logger.debug("To complete: %s with score %d", to_complete, score)

completion_scores.sort()
logger.info("Completion score=%d", completion_scores[len(completion_scores)//2])

We sort the scores, and then identify the median, as required.

The output looks like this:

2022-01-15 20:12:37.680:INFO:__main__:  Part 1: There are 49 corrupted lines
2022-01-15 20:12:37.681:INFO:__main__:  Syntax error score=339477

2022-01-15 20:12:37.682:INFO:__main__:  Part 2: There are 49 remaining incomplete lines
2022-01-15 20:12:37.686:INFO:__main__:  Completion score=3049320156
2022-01-15 20:12:37.687:INFO:__main__:  Execution time: 0.0664 seconds

Solution #2

Parsimonious is awesome, and we can give it any rules we want it to follow. But it does come with a bit of processing overhead. If we want to be super-quick, we can write our own parser. So that’s what I’ve done here. It also gives us a bit of practice at writing our own user-defined exceptions.

First, Part 1:

Setup

These are the imports we need:

import logging
import os
import time
from collections import deque
from dataclasses import dataclass

And we’ll still use these constants:

OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS))  # {'(': ')', ...}

COMPLETION_CHAR_SCORES = dict(zip(CLOSERS, (1, 2, 3, 4)))
INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137)))

Solution

And we’ll create a user-defined ParseException, by subclassing Exception. Note that we also make it a dataclass, so we don’t need to create an __init__() method to initialise any variables.

@dataclass
class ParseException(Exception):
    """ Data parsed and found to be invalid """
    expected: str
    actual: str

Now, for each line, we parse using this function:

def parse(line: str):
    """ Parse the navigation instructions, line-by-line.
    If we read any right bracket that does not match an existing left bracket, raise ParseException.
    """
    stack = deque()
    
    for char in line:        
        if char in OPENERS:
            stack.append(OPEN_TO_CLOSE[char])
            continue    # Move on to the next char
        
        assert char in CLOSERS, "Must be right bracket"
        popped = stack.pop()    # Pop the required right bracket
        if char == popped:
            continue    # This was the right bracket we needed
    
        raise ParseException(expected=popped, actual=char)  # Wrong right bracket - invalid line

It works by:

Maintaining a last-in, first-out stack (deque), where the last item is the corresponding right bracket needed for any left bracket just read.
If we read a right bracket that isn’t at the top of the stack, then it means this closing bracket is not required and is invalid. So we raise a ParseException, containing the invalid chaacter we just read.

Solving Part 1 is now just this:

# Part 1 - Looking for corrupted lines only
invalid_chars = []
for line in data:
    try:
        parse(line)
    except ParseException as pe:
        invalid_chars.append(pe.actual)

logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))                
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)

For Part 2, we just make some small tweaks.

First, we add a new user-defined exception:

@dataclass          
class ParseIncompleteException(Exception):
    """ Data parsed and found to be incomplete """
    remaining: str

Then we need to modify our parse() method:

def parse(line: str):
    """ Parse the navigation instructions, line-by-line.
    If we read any right bracket that does not match an existing left bracket, raise ParseException.
    If we read all the data but still have closing brackets left on the stack, raise ParseIncompleteException.
    """
    stack = deque()
    
    for char in line:        
        if char in OPENERS:
            stack.append(OPEN_TO_CLOSE[char])
            continue    # Move on to the next char
        
        assert char in CLOSERS, "Must be right bracket"
        popped = stack.pop()    # Pop the required right bracket
        if char == popped:
            continue    # This was the right bracket we needed
    
        raise ParseException(expected=popped, actual=char)  # Wrong right bracket - invalid line
    
    if stack:   # There were more left brackets than right, so this line is incomplete
        raise ParseIncompleteException(remaining="".join(reversed(stack)))

The only change here is the check at the end. If there are any closing brackets still on the stack, then we haven’t yet found these required brackets. So, throw a ParseIncompleteException, and pass in the closing brackets remaining on the stack, in reverse order. These are the brackets that are needed to complete our line!

And now let’s update our main method accordingly:

invalid_chars = []
completion_scores = []   # We've added this
for line in data:
    try:
        parse(line)
    except ParseException as pe:
        invalid_chars.append(pe.actual)
    except ParseIncompleteException as pie: # We've added this new exception type
        completion_scores.append(score_for_completion(pie.remaining))

logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))                
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)
    
# Part 2
completion_scores.sort()
logger.info("Part 2: Completion score=%d", completion_scores[len(completion_scores)//2])

And this is about 300x faster:

2022-01-15 20:35:23.126:INFO:__main__:  Part 1: There are 49 corrupted lines
2022-01-15 20:35:23.127:INFO:__main__:  Syntax error score=339477

2022-01-15 20:35:23.127:INFO:__main__:  Part 2: Completion score=3049320156
2022-01-15 20:35:23.128:INFO:__main__:  Execution time: 0.0021 seconds

Solution #3

This approach relies on stripping out complete bracket pairs from the input lines, until all that remains is either invalid characters, or brackets that need to be completed.

Slight tweaks to our constants:

OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
PAIRS = ["".join(item) for item in zip(OPENERS, CLOSERS)] # ['()', '[]', ...]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS))  # {'(': ')', ...}

INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137)))
COMPLETION_CHAR_SCORES = dict(zip(CLOSERS, (1, 2, 3, 4)))

To trim out valid brackets:

def trim_brackets(line: str) -> str:
    """ Iteratively strip out contiguous pairs of brackets from this line, until none remain """
    stripped = True
    while stripped:         # Continue until no more stripping
        stripped = False
        for bracket_pair in PAIRS: # E.g. "()"
            if bracket_pair in line:
                line = line.replace(bracket_pair, "")   # Strip out all occurences of this pair
                stripped = True

    return line

This is what it does with an input line:

'[({(<(())[]>[[{[]{<()<>>'  # remove ()
'[({(<()[]>[[{[]{<<>>'  # remove []
'[({(<()>[[{{<<>>'  # remove <>
'[({(<()>[[{{<>'  # remove ()
'[({(<>[[{{<>'  # remove <>
'[({([[{{'  # no pairs left to remove - valid but incomplete

Our get_completion_for_line() function is now really simple. We take the incomplete (but valid) lines, and simply add the corresponding closer for each opener, starting at the end and working back to the front.

def get_completion_for_line(line: str) -> str:
    """ Determine which closing brackets need to be added to complete this incomplete line. """
    
    to_complete = ""
    for opener in line[::-1]:
        to_complete += OPEN_TO_CLOSE[opener]
                
    return to_complete

So we solve both parts, thus:

# Part 1 - Looking for corrupted lines only
incomplete_lines = []
invalid_chars = []
trimmed_lines = [trim_brackets(line) for line in data]
for line in trimmed_lines:
    # corrupt lines will have closing brackets, whilst incomplete lines will not
    if any(char in CLOSERS for char in line):   # corrupt if any closer in the line
        for char in line:
            if char in CLOSERS:
                invalid_chars.append(char)  # find first closer
                break
    else:
        incomplete_lines.append(line)

logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))                
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)

# Part 2
logger.info("Part 2: There are %d remaining incomplete lines", len(incomplete_lines))
completion_scores = []
for line in incomplete_lines:
    to_complete = get_completion_for_line(line)
    completion_scores.append(score := score_for_completion(to_complete))
    logger.debug("To complete: %s with score %d", to_complete, score)

completion_scores.sort()
logger.info("Completion score=%d", completion_scores[len(completion_scores)//2])

Output:

04:20.963:INFO:__main__:     Part 1: There are 49 corrupted lines
04:20.965:INFO:__main__:     Syntax error score=339477

04:20.965:INFO:__main__:     Part 2: There are 49 remaining incomplete lines
04:20.968:INFO:__main__:     Completion score=3049320156
04:20.968:INFO:__main__:     Execution time: 0.0023 seconds

Also pretty quick.

‹›