Dazbo's Advent of Code solutions, written in Python
Day 5: Doesn't He Have Intern-Elves For This?
Regular Expressions (Regex)Regexr - Testing Regexlist comprehensionlist countsum
We’re asked to parse a list of strings, and determine which ones are naughty, and which ones are nice.
Nice strings must match the following rules:
The input data looks something like this:
ugknbfddgicrmopn
jchzalrnumimnmhp
haegwjzuvuyypxyu
dvszwmarrgswjxmb
How many strings are nice?
We could do this the naive way. For example, for the first rule, we could iterate through all chars in the string aeiou
, count how many times each appears in the line, and then add them to an overall vowel count.
But it’s actually much more efficient (and fun!) to do this with regex. And for me, this was a great opportunity to improve my regex skills.
Let’s look at the regex for each rule:
([aeiou].*){3,}
Breaking it down:
[aeiou]
means: match any one of aeiou
.*
means: followed by 0 or more characters(
and )
, followed by {3,}
means: match the whole thing three or more times.(.)\1
(.)
defines a group, where the group matches any character\1
means a back-reference to the group before it. I.e. determine what the group matched, and then repeat it.ab|cd|pq|xy
This is obvious. It means any of ab
, or cd
, or pq
, or xy
.
The code looks like this:
from pathlib import Path
import time
import re
SCRIPT_DIR = Path(__file__).parent
INPUT_FILE = Path(SCRIPT_DIR, "input/input.txt")
# part 1 rules
three_vowels_match = re.compile(r"([aeiou].*){3,}") # Match any vowel, three or more times
double_chars_match = re.compile(r"(.)\1") # \1 means: repeat what we matched before
bad_chars_match = re.compile(r"ab|cd|pq|xy")
def main():
with open(INPUT_FILE, mode="rt") as f:
data = f.read().splitlines()
nice_lines_count = [part_1_rules_match(line) for line in data].count(True)
print(f"Part 1: there are {nice_lines_count} nice strings.")
def part_1_rules_match(input_line):
if (three_vowels_match.search(input_line)
and double_chars_match.search(input_line)
and not bad_chars_match.search(input_line)):
return True
return False
Note that I’ve created a function called part_1_rules_match()
. This function:
search()
using each of our three rules.True
, whilst the last rule needs to be False
.Next, we use this function in a list comprehension. I.e.
[part_1_rules_match(line) for line in data]
list
of str
. It returns a list
of boolean values.True
, using the count()
method. Alternatively, we could have performed sum()
over the list, since all True
values have a integer value of 1
in Python.Simples!
Santa has abandoned all the rules from Part 1! A nice string must now follow these rules:
How many strings are nice with the new rules?
We need two new regex rules:
(..).*\1
(..)
identifies a group with any two adjacent characters..*
means followed any character, 0 or more times.\1
is a back-reference, meaning, repeat group 1. I.e. we must match the same two characters again.(.).\1
(.)
identifies any one character..
after the group means that the group must be followed by exactly one character.\1
means that the character in the first group (i.e. the first character matched) must now be repeated.First, let’s add our regex rules:
# part 2 rules
char_pair_repeats_match = re.compile(r"(..).*\1")
xwx_match = re.compile(r"(.).\1")
The remaining code additions to solve part 2 are trivial. First, a function that tests both of the new rules, and only returns True
if both rules match:
def part_2_rules_match(input_line):
if (char_pair_repeats_match.search(input_line)
and xwx_match.search(input_line)):
return True
return False
And finally, perform the count:
nice_lines_count = [part_2_rules_match(line) for line in data].count(True)
print(f"Part 2: there are {nice_lines_count} nice strings.")
The final output looks something like this:
Part 1: there are 258 nice strings.
Part 2: there are 53 nice strings.
Execution time: 0.0037 seconds