Learning Python with Advent of Code Walkthroughs

Dazbo's Advent of Code solutions, written in Python

Syntax error

Advent of Code 2021 - Day 10

Day 10: Syntax Scoring

Useful Links

Concepts and Packages Demonstrated

ParsimoniousList ComprehensionUser-defined exceptions

defaultdictreversing str

Problem Intro

I enjoyed this one. There are a couple of ways of going about the problem, and I ended up writing three different solutions.

We’re told that our sub’s navigation data is corrupted. The navigation is our input data, and it looks like this:

[({(<(())[]>[[{[]{<()<>>
[(()[<>])]({[<{<<[]>>(
{([(<{}[<>[]}>{[]{[(<()>
(((({<>}<{<{<>}{[]{[]{}
[[<[([]))<([[{}[[()]]]
...

We’re told some lines are incomplete, whilst some are corrupted.

Solution #1

My first solution uses the very cool Parsimonious. This is a library that allows the parsing of grammars, i.e. a set of terms that follow a defined set of rules. Think of it like regex on steroids, since it has the ability to recognise specific language terms, but also recurse into those terms.

Part 1

We’re ignoring incomplete lines. We’re told to find the first invalid chracter in each corrupted line, and determine its score. Each type of invalid character maps to one of four unique scores. Then we’re asked to add up these scores, to give the total syntax error score.

Setup

Let’s import what we need:

from collections import defaultdict
import logging
import os
import time
from parsimonious import Grammar, ParseError
from parsimonious.exceptions import IncompleteParseError

SCRIPT_DIR = os.path.dirname(__file__) 
INPUT_FILE = "input/input.txt"
# INPUT_FILE = "input/sample_input.txt"

logging.basicConfig(format="%(asctime)s.%(msecs)03d:%(levelname)s:%(name)s:\t%(message)s", 
                    datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)

The solution

Let’s define our Parsimonious Grammar, i.e. the set of rules that determine whether our input is valid.

# Each row (expr) must contain one or more chunks (any)
# Each chunk (any) can be of type 'normal', 'angle', 'square', or 'curly'
# Each type is composed of a bracket pair, and 0 or more inner chunks (any)
grammar = Grammar(r"""
    expr = any+
    any = (normal / angle / square / curly)
    normal = "(" any* ")"
    angle = "<" any* ">"
    square = "[" any* "]"
    curly = "{" any* "}"
""")

Explaining the rules:

Now we create a list of opening brackets, and a list of corresponding closing brackets. Then we zip these two lists together, to create two dicts:

And finally, one more dictionary, which maps each closing bracket to the arbitrary invalid char scores we were told to use.

OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS))  # {'(': ')', ...}
CLOSE_TO_OPEN = dict(zip(CLOSERS, OPENERS))  # {')': '(', ...}

INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137))) # {')': 3, ...}

Now we run it:

input_file = os.path.join(SCRIPT_DIR, INPUT_FILE)
with open(input_file, mode="rt") as f:
    data = f.read().splitlines()

# Part 1 - Looking for corrupted lines only
incomplete_lines = []
invalid_chars = []
for line_num, line in enumerate(data):
    try:
        grammar.parse(line)
    except IncompleteParseError:    # valid, but incomplete
        incomplete_lines.append(line)
    except ParseError as err:
        if err.pos == len(line):    # valid, but incomplete
            incomplete_lines.append(line)
        else:                       # corrupted
            logger.debug("%d: %s", line_num, line)
            logger.debug("Found %s", line[err.pos])
            invalid_chars.append(line[err.pos])

logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))                
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)

First, we read all the data and split into separate lines. We parse each line, using our grammar. We’re only interested in lines which fail to validate, and these are identified as follows:

Easy!

Part 2

Unsurprisingly, we’re now asked to work with the incomplete lines. Specifically, we’re told we need to complete the incomplete lines, by adding the necessary sequence of closing brackets. Work out the completion score.

We’ll do most of the hard work here:

def get_completion_for_line(line: str) -> str:
    """ Determine which closing brackets need to be added to complete this incomplete line. """
    
    to_complete = ""
    close_counters = defaultdict(int)
    for char in line[::-1]:     # process chars from the end
        if char in CLOSERS:
            close_counters[char] += 1
        else:  # opener
            matching_closer = OPEN_TO_CLOSE[char]
            if close_counters[matching_closer] > 0:    # opener for existing closer
                close_counters[matching_closer] -= 1
            else:    # opener, but missing closer
                to_complete += matching_closer
                
    return to_complete

The objective is to take each incomplete line, and then work out which brackets haven’t been closed.

This is what the function does:

Now we need to determine the score that results from the set of closing brackets we’ve identified.

def score_for_completion(completion_str: str) -> int:
    """ Arbitrary rules for calculating a score for a str of completion chars """
    score = 0
    
    for char in completion_str:
        score *= 5
        score += COMPLETION_CHAR_SCORES[char]
        
    return score    

Finally, let’s run all this, to solve Part 2.

# Part 2  
logger.info("Part 2: There are %d remaining incomplete lines", len(incomplete_lines))
completion_scores = []
for line in incomplete_lines:
    to_complete = get_completion_for_line(line)
    completion_scores.append(score := score_for_completion(to_complete))
    logger.debug("To complete: %s with score %d", to_complete, score)

completion_scores.sort()
logger.info("Completion score=%d", completion_scores[len(completion_scores)//2])

We sort the scores, and then identify the median, as required.

The output looks like this:

2022-01-15 20:12:37.680:INFO:__main__:  Part 1: There are 49 corrupted lines
2022-01-15 20:12:37.681:INFO:__main__:  Syntax error score=339477

2022-01-15 20:12:37.682:INFO:__main__:  Part 2: There are 49 remaining incomplete lines
2022-01-15 20:12:37.686:INFO:__main__:  Completion score=3049320156
2022-01-15 20:12:37.687:INFO:__main__:  Execution time: 0.0664 seconds

Solution #2

Parsimonious is awesome, and we can give it any rules we want it to follow. But it does come with a bit of processing overhead. If we want to be super-quick, we can write our own parser. So that’s what I’ve done here. It also gives us a bit of practice at writing our own user-defined exceptions.

First, Part 1:

Setup

These are the imports we need:

import logging
import os
import time
from collections import deque
from dataclasses import dataclass

And we’ll still use these constants:

OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS))  # {'(': ')', ...}

COMPLETION_CHAR_SCORES = dict(zip(CLOSERS, (1, 2, 3, 4)))
INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137)))

Solution

And we’ll create a user-defined ParseException, by subclassing Exception. Note that we also make it a dataclass, so we don’t need to create an __init__() method to initialise any variables.

@dataclass
class ParseException(Exception):
    """ Data parsed and found to be invalid """
    expected: str
    actual: str

Now, for each line, we parse using this function:

def parse(line: str):
    """ Parse the navigation instructions, line-by-line.
    If we read any right bracket that does not match an existing left bracket, raise ParseException.
    """
    stack = deque()
    
    for char in line:        
        if char in OPENERS:
            stack.append(OPEN_TO_CLOSE[char])
            continue    # Move on to the next char
        
        assert char in CLOSERS, "Must be right bracket"
        popped = stack.pop()    # Pop the required right bracket
        if char == popped:
            continue    # This was the right bracket we needed
    
        raise ParseException(expected=popped, actual=char)  # Wrong right bracket - invalid line

It works by:

Solving Part 1 is now just this:

# Part 1 - Looking for corrupted lines only
invalid_chars = []
for line in data:
    try:
        parse(line)
    except ParseException as pe:
        invalid_chars.append(pe.actual)

logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))                
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)

For Part 2, we just make some small tweaks.

First, we add a new user-defined exception:

@dataclass          
class ParseIncompleteException(Exception):
    """ Data parsed and found to be incomplete """
    remaining: str

Then we need to modify our parse() method:

def parse(line: str):
    """ Parse the navigation instructions, line-by-line.
    If we read any right bracket that does not match an existing left bracket, raise ParseException.
    If we read all the data but still have closing brackets left on the stack, raise ParseIncompleteException.
    """
    stack = deque()
    
    for char in line:        
        if char in OPENERS:
            stack.append(OPEN_TO_CLOSE[char])
            continue    # Move on to the next char
        
        assert char in CLOSERS, "Must be right bracket"
        popped = stack.pop()    # Pop the required right bracket
        if char == popped:
            continue    # This was the right bracket we needed
    
        raise ParseException(expected=popped, actual=char)  # Wrong right bracket - invalid line
    
    if stack:   # There were more left brackets than right, so this line is incomplete
        raise ParseIncompleteException(remaining="".join(reversed(stack)))

The only change here is the check at the end. If there are any closing brackets still on the stack, then we haven’t yet found these required brackets. So, throw a ParseIncompleteException, and pass in the closing brackets remaining on the stack, in reverse order. These are the brackets that are needed to complete our line!

And now let’s update our main method accordingly:

invalid_chars = []
completion_scores = []   # We've added this
for line in data:
    try:
        parse(line)
    except ParseException as pe:
        invalid_chars.append(pe.actual)
    except ParseIncompleteException as pie: # We've added this new exception type
        completion_scores.append(score_for_completion(pie.remaining))

logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))                
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)
    
# Part 2
completion_scores.sort()
logger.info("Part 2: Completion score=%d", completion_scores[len(completion_scores)//2])

And this is about 300x faster:

2022-01-15 20:35:23.126:INFO:__main__:  Part 1: There are 49 corrupted lines
2022-01-15 20:35:23.127:INFO:__main__:  Syntax error score=339477

2022-01-15 20:35:23.127:INFO:__main__:  Part 2: Completion score=3049320156
2022-01-15 20:35:23.128:INFO:__main__:  Execution time: 0.0021 seconds

Solution #3

This approach relies on stripping out complete bracket pairs from the input lines, until all that remains is either invalid characters, or brackets that need to be completed.

Slight tweaks to our constants:

OPENERS = ["(", "[", "{", "<"]
CLOSERS = [")", "]", "}", ">"]
PAIRS = ["".join(item) for item in zip(OPENERS, CLOSERS)] # ['()', '[]', ...]
OPEN_TO_CLOSE = dict(zip(OPENERS, CLOSERS))  # {'(': ')', ...}

INVALID_CHAR_SCORES = dict(zip(CLOSERS, (3, 57, 1197, 25137)))
COMPLETION_CHAR_SCORES = dict(zip(CLOSERS, (1, 2, 3, 4)))

To trim out valid brackets:

def trim_brackets(line: str) -> str:
    """ Iteratively strip out contiguous pairs of brackets from this line, until none remain """
    stripped = True
    while stripped:         # Continue until no more stripping
        stripped = False
        for bracket_pair in PAIRS: # E.g. "()"
            if bracket_pair in line:
                line = line.replace(bracket_pair, "")   # Strip out all occurences of this pair
                stripped = True

    return line

This is what it does with an input line:

'[({(<(())[]>[[{[]{<()<>>'  # remove ()
'[({(<()[]>[[{[]{<<>>'  # remove []
'[({(<()>[[{{<<>>'  # remove <>
'[({(<()>[[{{<>'  # remove ()
'[({(<>[[{{<>'  # remove <>
'[({([[{{'  # no pairs left to remove - valid but incomplete

Our get_completion_for_line() function is now really simple. We take the incomplete (but valid) lines, and simply add the corresponding closer for each opener, starting at the end and working back to the front.

def get_completion_for_line(line: str) -> str:
    """ Determine which closing brackets need to be added to complete this incomplete line. """
    
    to_complete = ""
    for opener in line[::-1]:
        to_complete += OPEN_TO_CLOSE[opener]
                
    return to_complete

So we solve both parts, thus:

# Part 1 - Looking for corrupted lines only
incomplete_lines = []
invalid_chars = []
trimmed_lines = [trim_brackets(line) for line in data]
for line in trimmed_lines:
    # corrupt lines will have closing brackets, whilst incomplete lines will not
    if any(char in CLOSERS for char in line):   # corrupt if any closer in the line
        for char in line:
            if char in CLOSERS:
                invalid_chars.append(char)  # find first closer
                break
    else:
        incomplete_lines.append(line)

logger.info("Part 1: There are %d corrupted lines", len(invalid_chars))                
score = sum([INVALID_CHAR_SCORES[char] for char in invalid_chars])
logger.info("Syntax error score=%d\n", score)

# Part 2
logger.info("Part 2: There are %d remaining incomplete lines", len(incomplete_lines))
completion_scores = []
for line in incomplete_lines:
    to_complete = get_completion_for_line(line)
    completion_scores.append(score := score_for_completion(to_complete))
    logger.debug("To complete: %s with score %d", to_complete, score)

completion_scores.sort()
logger.info("Completion score=%d", completion_scores[len(completion_scores)//2])

Output:

21:04:20.963:INFO:__main__:     Part 1: There are 49 corrupted lines
21:04:20.965:INFO:__main__:     Syntax error score=339477

21:04:20.965:INFO:__main__:     Part 2: There are 49 remaining incomplete lines
21:04:20.968:INFO:__main__:     Completion score=3049320156
21:04:20.968:INFO:__main__:     Execution time: 0.0023 seconds

Also pretty quick.