Learning Python with Advent of Code Walkthroughs

Dazbo's Advent of Code solutions, written in Python

IPv7 Overlapping Regex

Advent of Code 2016 - Day 7

Day 7: Internet Protocol Version 7

Useful Links

Concepts and Packages Demonstrated

regexOverlapping MatchesString Manipulation

Problem Intro

Today’s puzzle involves validating IPv7 addresses based on specific patterns. An IPv7 address consists of a sequence of characters, some of which are enclosed in square brackets. The parts outside the brackets are called “supernet sequences”, and the parts inside the brackets are called “hypernet sequences”.

The core of the problem revolves around identifying a specific pattern called an “ABBA” sequence. An ABBA is a four-character sequence where the first two characters are different, and the sequence is followed by its reverse. For example, abba, bddb, or xyyx are ABBAs, but aaaa or abca are not.

Here’s an example of what the input data looks like:

abba[mnop]qrst
abcd[bddb]xyyx
aaaa[qwer]tyui
ioxxoj[asdfgh]zxcvbn

Part 1

How many IPs support TLS?

An IP address supports Transport-Layer Snooping (TLS) if it meets two conditions:

  1. It contains at least one ABBA sequence within any of its supernet sequences.
  2. It does not contain any ABBA sequence within any of its hypernet sequences.

My approach to solving Part 1 involves:

  1. Parsing the input: For each line (IP address), I need to separate the supernet sequences from the hypernet sequences. Regular expressions are ideal for this.
  2. Checking hypernets first: If any hypernet sequence contains an ABBA, the entire IP address is invalid for TLS, and we can move to the next line.
  3. Checking supernets: If no hypernet contains an ABBA, then we check if any supernet sequence contains an ABBA. If it does, the IP address is valid for TLS.

Here’s how I extract the supernet and hypernet sequences using the regex module:

# Only match text outside the square bracekts - supernet sequences
non_brackets_pattern = regex.compile(r"(\w+)(?:\[\w*\])*")

# Only match the text within the square brackets - hypernet sequences
brackets_pattern = regex.compile(r"\[(\w+)\]+")

# ... inside main() loop
        supernets = non_brackets_pattern.findall(line)
        hypernets = brackets_pattern.findall(line)

The non_brackets_pattern captures sequences of word characters that are not enclosed in brackets, while brackets_pattern captures the word characters within brackets.

Next, I implemented a helper function contains_abba to check for the ABBA pattern:

def contains_abba(potential: str, abba_size: int) -> bool:
    """ Check whether this string contains an ABBA sequence.
    E.g. abba is valid.  deed is valid.  xyyx is valid.
    aaaa is not.  aabb is not.  abcd is not.

    Args:
        potential (str): Any str
        abba_size (int): The number of chars in a valid ABBA sequence

    Returns:
        bool: Whether valid ABBA.
    """
    for i in range(len(potential) - abba_size + 1):
        block = potential[i:i+4]

        if block[0] != block[1] and block[0:2] == (block[3:1:-1]):
            return True

    return False

This function iterates through all possible 4-character blocks in the given string. It checks if the first two characters are different and if the first pair is the reverse of the second pair.

The main logic for Part 1 then becomes:

    valid_ips = []
    for line in data:
        supernets = non_brackets_pattern.findall(line)
        hypernets = brackets_pattern.findall(line)

        # first check we don't have an ABBA in a hypernet seq, i.e. within [...]
        try:
            for hypernet in hypernets:
                # if we find a match, this line is NOT a valid IP. Move on to next line.
                if contains_abba(hypernet, 4):
                    raise StopIteration() # Custom exception to break out of nested loops
        except StopIteration:
            continue # Move to the next IP line

        try:
            for supernet in supernets:
                # if we find a match, this line is a valid IP. Add and move on.
                if contains_abba(supernet, 4):
                    valid_ips.append(line)
                    raise StopIteration() # Custom exception to break out of nested loops
        except StopIteration:
            continue # Move to the next IP line

    logging.info(f"Part 1: Found {len(valid_ips)} valid TLS IPs.")

I use a StopIteration exception to efficiently break out of nested loops once a condition (either invalid hypernet or valid supernet) is met for an IP address.

Part 2

How many IPs support SSL?

An IP address supports Super-Secret Listening (SSL) if it has an “Area-Broadcast Accessor” (ABA) anywhere in its supernet sequences and a corresponding “Byte Allocation Block” (BAB) anywhere in its hypernet sequences.

An ABA is a three-character sequence like xyx where x and y are different. A corresponding BAB would be yxy. For example, if a supernet has aba, a hypernet must have bab.

A crucial aspect of Part 2 is finding overlapping matches for ABA sequences. The standard re module in Python does not support this directly, so I used the more powerful regex module (installed via pip install regex).

Here’s the pattern for finding ABA sequences:

aba_pattern = regex.compile(r"((.).\2)")

This regex captures a character, then any character, then the first captured character again (e.g., (a).(\1)). The overlapped=True flag in findall is essential for finding all possible ABAs, even if they overlap (e.g., ababa contains two ABAs: aba and bab).

The logic for Part 2 is as follows:

    valid_ips = []
    for line in data:
        supernets = non_brackets_pattern.findall(line)
        hypernets = brackets_pattern.findall(line)

        try:
            babs = []
            for supernet in supernets:
                # look for ABAs in supernets
                aba_matches = aba_pattern.findall(supernet, overlapped=True)
                for aba_match in aba_matches:
                    match_str = str(aba_match[0])
                    # ensure middle char is different to avoid 'aaa'
                    if match_str[0] != match_str[1]:
                        bab = match_str[1] + match_str[0] + match_str[1] # Construct BAB
                        babs.append(bab)

            # Then check if any of those BABs are present in any of the hypernets
            for bab in babs:
                for hypernet in hypernets:
                    if bab in hypernet:
                        valid_ips.append(line)
                        raise StopIteration() # Custom exception to break out of nested loops
        except StopIteration:
            continue # Move to the next IP line

    logging.info(f"Part 2: Found {len(valid_ips)} valid SSL IPs.")

For each supernet, I find all ABA matches. For each valid ABA (where the first and second characters are different), I construct its corresponding BAB. Then, I check if any of these generated BABs exist within any of the hypernet sequences for that IP address. If a match is found, the IP supports SSL.

This solution effectively leverages the regex library’s advanced features to handle the overlapping match requirement, making the parsing and validation straightforward.