Advent of Code 2015 - Day 5

Day 5: Doesn't He Have Intern-Elves For This?

Useful Links

‹›

Concepts and Packages Demonstrated

Regular Expressions (Regex)Regexr - Testing Regex list comprehension list count sum

Problem Intro

We’re asked to parse a list of strings, and determine which ones are naughty, and which ones are nice.

Nice strings must match the following rules:

Contains at least three vowels. (They don’t have to be different.)
Contains at least one letter that appears twice in a row, e.g. aa, bb, cc, etc.
Does not contain any of ab, cd, pq, xy.

The input data looks something like this:

ugknbfddgicrmopn
jchzalrnumimnmhp
haegwjzuvuyypxyu
dvszwmarrgswjxmb

Part 1

How many strings are nice?

We could do this the naive way. For example, for the first rule, we could iterate through all chars in the string aeiou, count how many times each appears in the line, and then add them to an overall vowel count.

But it’s actually much more efficient (and fun!) to do this with regex. And for me, this was a great opportunity to improve my regex skills.

Let’s look at the regex for each rule:

At least three vowels

([aeiou].*){3,}

Breaking it down:

[aeiou] means: match any one of aeiou
.* means: followed by 0 or more characters
wrapped in ( and ), followed by {3,} means: match the whole thing three or more times.

At least one character repeat

(.)\1

(.) defines a group, where the group matches any character
\1 means a back-reference to the group before it. I.e. determine what the group matched, and then repeat it.

Does not contain ab, cd, pq, xy

ab|cd|pq|xy

This is obvious. It means any of ab, or cd, or pq, or xy.

Solution

The code looks like this:

from pathlib import Path
import time
import re

SCRIPT_DIR = Path(__file__).parent 
INPUT_FILE = Path(SCRIPT_DIR, "input/input.txt")

# part 1 rules
three_vowels_match = re.compile(r"([aeiou].*){3,}") # Match any vowel, three or more times
double_chars_match = re.compile(r"(.)\1")  # \1 means: repeat what we matched before
bad_chars_match = re.compile(r"ab|cd|pq|xy")

def main():
    with open(INPUT_FILE, mode="rt") as f:
        data = f.read().splitlines()

    nice_lines_count = [part_1_rules_match(line) for line in data].count(True)
    print(f"Part 1: there are {nice_lines_count} nice strings.")

def part_1_rules_match(input_line):
    if (three_vowels_match.search(input_line) 
            and double_chars_match.search(input_line) 
            and not bad_chars_match.search(input_line)):
        return True
    
    return False

Note that I’ve created a function called part_1_rules_match(). This function:

Takes the current input line, and performs a regex search() using each of our three rules.
The first two rules need to be True, whilst the last rule needs to be False.

Next, we use this function in a list comprehension. I.e.

[part_1_rules_match(line) for line in data]

This code calls our function for each line in data, which is a list of str. It returns a list of boolean values.
Finally, we count how many members of this list are True, using the count() method. Alternatively, we could have performed sum() over the list, since all True values have a integer value of 1 in Python.

Simples!

Part 2

Santa has abandoned all the rules from Part 1! A nice string must now follow these rules:

At least one adjacent pair of letters in the string must appear more than once, without overlapping. E.g. aabcdeaa.
At least one pair of identical letters, with any single letter between them. E.g. aba, xyx, efe, and even zzz.

How many strings are nice with the new rules?

We need two new regex rules:

A repeating pair

(..).*\1

The (..) identifies a group with any two adjacent characters.
The .* means followed any character, 0 or more times.
The \1 is a back-reference, meaning, repeat group 1. I.e. we must match the same two characters again.

Any two chars that repeat, with a single char between them

(.).\1

The (.) identifies any one character.
The . after the group means that the group must be followed by exactly one character.
The final \1 means that the character in the first group (i.e. the first character matched) must now be repeated.

Solution

First, let’s add our regex rules:

# part 2 rules
char_pair_repeats_match = re.compile(r"(..).*\1")
xwx_match = re.compile(r"(.).\1")

The remaining code additions to solve part 2 are trivial. First, a function that tests both of the new rules, and only returns True if both rules match:

def part_2_rules_match(input_line):
    if (char_pair_repeats_match.search(input_line)
            and xwx_match.search(input_line)):
        return True
    
    return False

And finally, perform the count:

    nice_lines_count = [part_2_rules_match(line) for line in data].count(True)
    print(f"Part 2: there are {nice_lines_count} nice strings.")

The final output looks something like this:

Part 1: there are 258 nice strings.
Part 2: there are 53 nice strings.
Execution time: 0.0037 seconds

‹›