Learning Python with Advent of Code Walkthroughs

Dazbo's Advent of Code solutions, written in Python

Moleculer retrosynthesis

Advent of Code 2015 - Day 19

Day 19: Medicine for Rudolph

Useful Links

Concepts and Packages Demonstrated

RegexSetsdefaultdictcomprehension aggregate functions

Problem Intro

We’re told Rudolph needs medicine, and we’re going to need to synthesize the medicine! We are able to construct molecules using reindeer chemistry. We’re told reindeer chemistry works by starting with some input molecule and then doing a series of replacements, one per step, until we have the right target molecule.

We’re given sample input that looks something like this:

e => H
e => O
H => HO
H => OH
O => HH


The last row is the the medicine molecule and the other rows are the possible replacements. Note how some molecules can be replaced with more than one target molecule.

Part 1

First, we need to calibrate the reindeer chemistry machine.

How many distinct molecules can be created after all the different ways you can do one replacement on the medicine molecule?

I start by reading in the data and constructing a mapping dictionary from each source molecule to the target molecules, and for each target molecule to its source:

def process_input(data: list) -> tuple[dict, dict, str]:
    subst_match = re.compile(r"^(\w+) => (\w+)")
    # each src group can make many target groups
    src_groups = defaultdict(list)

    # each target group can be made from only one src group
    target_groups = {}

    for line in data:
        if "=>" in line:
            group, target_group = subst_match.findall(line)[0]
            src_groups[group] += [target_group]
            target_groups[target_group] = group

    logger.debug("Src groups:\n%s", src_groups)
    logger.debug("Target groups:\n%s", target_groups)
    return src_groups, target_groups, data[-1]

I’m using regex to parse the input lines for each group (nothing new here), and then - for the source groups - I’m using a defaultdict. This is useful, because I want the dict key to be the source molecule, and the dict value to be a list of target molecules. But I don’t want to have to create the list for the first target molecule.

With the sample input above, this gives the following output:

Reindeer chemistry groups

Now, to solve part 2, my strategy is:

All of this is done in the substitute_groups function:

def substitute_groups(groups: dict, molecule: str) -> list:
    new_molecules = []

    # go through all the groups we have substitutions for
    for group, targets in groups.items():
        # get all matching positions for this group
        group_matches = re.finditer(group, molecule)

        # move left to right, matching group one at a time
        for group_match in group_matches:
            start, end = group_match.span()
            prefix = molecule[:start]
            suffix = molecule[end:]

            # replace the current group occurrence with each target
            for target in targets:
                new_molecules.append(prefix + target + suffix)
    return new_molecules

It works by using finditer() to return matches for all the source groups. This is useful, because each match has a start and end position in the overall string. We can use these start and end positions to define the portion of the string that will be replaced, whilst leaving the rest of the string untouched. So, for each match, we construct a new target molecule, by replacing the source molecule with the next available target molecule.

So, we’re ready to solve Part 1:

    with open(locations.sample_input_file, mode="rt") as f:
        data = f.read().splitlines()

    src_groups, target_groups, medicine_molecule = process_input(data)

    new_molecules = substitute_groups(src_groups, medicine_molecule)
    unique_new_molecules = set(new_molecules)
    logger.debug("Unique molecules: %s", unique_new_molecules)
    logger.info("Part 1: Identified %d unique molecules.", len(unique_new_molecules))

Note that I’m using a set to remove duplicate molecules.

Using the sample input, the resulting output looks like this:

Reindeer chemistry groups

Good, that works!

Part 2

Now we’re ready to fabricate our target molecule. We’re told we start with a single electron e, and apply replacements one a time. Each replacement counts as one step.

Given the available replacements and the medicine molecule in your puzzle input, what is the fewest number of steps to go from e to the medicine molecule?

Here’s my strategy:

Second time around (with the sample data):

Third time around:

Finally, we see that H can only be made from e. So, performing the final reverse substitution, we end up with the stack: [[3, "HHH"], [1, "OH"], [1, "H"], [1, "e"]]

If we reverse the stack, we can see that we assemble the target medicine molecule like this:

1: e -> H
2: H -> OH
3: (O)H -> (HH)H
4: (H)HH -> (HO)HH
5: HO(H)H -> HO(HO)H
6: HOHO(H) -> HOHO(HO)

Here’s my code to achieve this:

def retrosynthesis(target_groups: dict, target_molecule: str) -> list:
    synthesis_stack = []
    current_molecule = target_molecule

    # start by doing all the substitions, without e
    # repeat until the molecule is not modified
    molecule_modified = True
    while molecule_modified:
        molecule_modified = False

        for tgt_grp, src_grp in target_groups.items():
            if src_grp == 'e':

            # count how many matches of target first
            substitutions = current_molecule.count(tgt_grp)

            # then replace them all
            if substitutions > 0:
                current_molecule = current_molecule.replace(tgt_grp, src_grp)
                molecule_modified = True
                synthesis_stack.append([substitutions, current_molecule])

    # now replace target with e
    for tgt_grp, src_grp in target_groups.items():
        if src_grp != 'e':

        # count how many matches of target first
        substitutions = current_molecule.count(tgt_grp)

        # then replace them all
        if substitutions > 0:
            current_molecule = current_molecule.replace(tgt_grp, src_grp)
            synthesis_stack.append([substitutions, current_molecule])

    return synthesis_stack

And we can now solve Part 2 like this:

    synthesis_stack = retrosynthesis(target_groups, medicine_molecule)
    synthesis_steps = sum(subs for subs, molecule in synthesis_stack)
    logger.info("Part 2: Synthesis stack requires %d steps.", synthesis_steps)

Note the use of the sum() aggregate function on the dictionary comprehension, which is a nice Pythonic way of summing up all the substitutions.


If I set my logging level to INFO rather than DEBUG, and run with my real data, the output looks like this:

Reindeer chemistry output

So, it’s pretty quick!