Dazbo's Advent of Code solutions, written in Python

Breadth First Search (BFS)pathlibgraphsnetworkxmatplotlib

A little bit tricky, this one.

I’ve created two solutions for this:

- Solution #1 - Manually building a graph using an adjacency dictionary
- Solution #2 - Using NetworkX to facilitate building the graph, and to visualise
- Visualisation - The visualisation code

We’re told we’re navigating a cave system which contains large caves (denoted by uppercase labels) and small caves (denoted by lowercase labels).

The input data looks something like this:

```
start-A
start-b
A-c
A-b
b-d
A-end
b-end
```

And if we map out these connections, they can be visualised like this:

```
start
/ \
c--A-----b--d
\ /
end
```

We’re asked to find the number of distinct paths that start at `start`

and end at `end`

, and don’t visit any small caves more than once.

This is the no frills solution.

Whenever we need to find all the paths from one point to another, a **BFS** should immediately come to mind as a possible way to solve the problem. The BFS is an algorithm we can always use to find all paths between two points. The tricky addition to this problem is that the small caves are one-way gates.

Nothing new here:

```
import logging
import os
import time
from collections import defaultdict, deque
logging.basicConfig(format="%(asctime)s.%(msecs)03d:%(levelname)s:%(name)s:\t%(message)s",
datefmt='%H:%M:%S')
logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)
SCRIPT_DIR = os.path.dirname(__file__)
INPUT_FILE = "input/input.txt"
# INPUT_FILE = "input/sample_input.txt"
```

Now we need to build a **graph**. In computing (and maths), a graph is a finite set of *vertices* (aka *nodes* or *points*), which are connected by *edges* (aka *links*).

If the *edges* have a direction, they are represented as arrows, and the graph is a *directed graph*. If the direction doesn’t matter, then *edges* are represented as lines, and the graph is an *undirected graph*. An undirected graph looks something like this:

Here, it makes sense to represent our network of caves as an undirected graph, and then to find all the unique paths through it, given the “small caves” constraint.

We start by creating our `CaveGraph`

class:

```
class CaveGraph():
""" Stores pairs of connected caves, i.e. edges.
Determines all caves from supplied edges.
Determines which caves are small, and which are large.
Creates a lookup to obtain all caves linked to a given cave.
Finally, knows how to determine all unique paths from start to end,
according to the rules given. """
START = "start"
END = "end"
def __init__(self, edges:set[tuple[str, str]]) -> None:
""" Takes a set of edges, where each edge is in the form (a, b) """
self.start = CaveGraph.START
self.end = CaveGraph.END
self._edges: set[tuple[str, str]] = set(edges)
self._nodes: set[str] = set()
self._small_caves: set[str] = set()
self._large_caves: set[str] = set()
self._determine_caves() # populate the empty fields
# Create lookup (adjacency) dict to find all linked nodes for a node
self._node_map: dict[str, set[str]] = defaultdict(set)
for x,y in edges: # E.g. x, y
self._node_map[x].add(y)
self._node_map[y].add(x)
assert self.start in self._node_map, "Start needs to be mapped"
assert self.end in self._node_map, "Finish needs to be mapped"
@property
def edges(self):
""" All the edges. An edge is one cave linked to another. """
return self._edges
@property
def small_caves(self):
""" Caves labelled lowercase. Subset of self.caves. """
return self._small_caves
@property
def large_caves(self):
""" Caves labelled uppercase. Subset of self.caves. """
return self._large_caves
def _determine_caves(self):
""" Build a set of all caves from the edges.
This will also initialise small_caves and large_caves """
for edge in self._edges:
for cave in edge:
self._nodes.add(cave)
if cave not in (self.start, self.end):
if cave.islower():
self._small_caves.add(cave)
else:
self._large_caves.add(cave)
def _get_adjacent_caves(self, node: str) -> set[str]:
""" Returns the adjacent caves, given a cave input. """
return self._node_map[node]
```

Things to say about this:

- First, the
`__init__()`

method:- We can see it expects to be initialised using a
`set`

of pairs of locations. I.e. a set of*edges*, where each edge is in the form of two*nodes*(which are caves). - It stores all the edges as
`_edges`

. - It then runs
`_determine_caves()`

, which:- Iterates through all the edges.
- For each edge, looks at each cave in the pair.
- Adds the edge (the cave) to
`_caves`

. We’re using sets, which only store unique values. So if we add the same cave more than once, it doesn’t matter. - Checks if the cave is small or large, and adds to
`_small_caves`

and`_large_caves`

, respectively. - Note how we’re once again prefixing all class variables and methods with
`_`

, where they are only intended to be used by the object, and are not intended to be accessed outside the object.

- Now we create an
`adjacency dictionary`

, which is basically a dictionary that maps each node (cave) to every other node (cave) that it is linked to. We do this using a`defaultdict(set)`

, i.e. a dict that it is initialised with an empty`set`

for each key. That way, whenever we come across a cave that is linked to some cave`x`

, we can simply add the new cave to the existing`_node_map[x]`

set. - And that’s the graph built!

- We can see it expects to be initialised using a
- We add some properties for accessing the graph’s data. E.g.
`edges`

,`small_caves`

,`large_caves`

. - We add a method that allows us to get all the adjacent caves for a given cave. This simply does a lookup using our adjacency dictionary.

Now let’s add the method that will do the hard work. I.e. actually performs the BFS:

```
def get_paths_through(self) -> set[tuple]:
""" Get all unique paths through from start to end, using a BFS """
start = (self.start, [self.start]) # (starting cave, [path with only start])
queue = deque()
queue.append(start)
unique_paths: set[tuple] = set() # To store each path that goes from start to end
while queue:
# If we popleft(), we do a BFS. If we simply pop(), we're doing a DFS.
# Since we need to discover all paths, it makes no difference to performance.
cave, path = queue.popleft() # current cave, paths visited
if cave == self.end: # we've reached the end of a desired path
unique_paths.add(tuple(path))
continue
for neighbour in self._get_adjacent_caves(cave):
new_path = path + [neighbour] # Need a new path object
if neighbour in path:
# big caves fall through and can be re-added to the path
if neighbour in (self.start, self.end):
continue # we can't revisit start and finish
if neighbour in self.small_caves:
continue # we can't revisit small cave
assert neighbour in self.large_caves
# If we're here, it's either big caves or neighbours not in the path
queue.append((neighbour, new_path))
logger.debug(new_path)
return unique_paths
```

This method is a fairly standard BFS, as we’ve used previously:

- We create a
`start`

`tuple`

, which has the first value set to our`start`

cave, and the second value set to a`list`

that represents the path taken so far. Initially, this`list`

only contains the`start`

cave. - We create a variable called
`queue`

, which will be our*frontier*, implemented as a`deque`

. - We add our
`start`

tuple to the queue. -
We create a

`set`

to store all unique paths found. - Now, as per the BFS pattern, whilst there are items remaining on the queue:
- Pop the next item.
- Evaluate if this next item is the
`end`

. If it is:- Then we’ve found a new path, so we need to store it.
- Note that we convert the
`list`

path to a`tuple`

path when we add it. We need to do this, because`sets`

only store`hashable`

objects;`tuples`

are hashable, but`lists`

are not. - Then
`continue`

. (We don’t`break`

, because we want to find*all*the paths.)

- Now find all adjacent caves, and for each:
- Add it to the current path.
- Determine if the adjacent cave is already in the path.
- If it is and it’s the start, end or small, then we can’t backtrack. So we don’t add it to the queue, and we move on to the next adjacent cave.

- If it adjacent cave was not already in the path, or was in the cave but was a big cave, then we can add it to the queue.

- When the queue is empty, we’ve now found all the paths.

We run it like this:

```
input_file = os.path.join(SCRIPT_DIR, INPUT_FILE)
with open(input_file, mode="rt") as f:
edges = set(tuple(line.split("-")) for line in f.read().splitlines())
graph = CaveGraph(edges) # Create graph from edges supplied in input
# Part 1
unique_paths = graph.get_paths_through()
logger.info("Part 1: unique paths count=%d", len(unique_paths))
```

Now we’re told we’re allowed to visit any single small cave twice for any given path. Implementing this requires some thinking!!

My approach was to modify the `get_paths_through()`

method to take a parameter `small_cave_twice`

which defaults to `False`

. If this is set to `True`

, then the method allows the single revisit. Thus, we can use the same method for path parts. Of course, we need to modify the method itself:

```
def get_paths_through(self, small_cave_twice=False) -> set[tuple]:
""" Get all unique paths through from start to end, using a BFS
Args:
small_cave_twice (bool, optional): Whether we can
visit a small cave more than once. Defaults to False.
"""
start = (self.start, [self.start], False) # (starting cave, [path with only start], used twice)
queue = deque()
queue.append(start)
unique_paths: set[tuple] = set() # To store each path that goes from start to end
while queue:
# If we popleft(), we do a BFS. If we simply pop(), we're doing a DFS.
# Since we need to discover all paths, it makes no difference to performance.
cave, path, used_twice = queue.popleft() # current cave, paths visited, twice?
if cave == self.end: # we've reached the end of a desired path
unique_paths.add(tuple(path))
continue
for neighbour in self._get_adjacent_caves(cave):
new_path = path + [neighbour] # Need a new path object
if neighbour in path:
# big caves fall through and can be re-added to the path
if neighbour in (self.start, self.end):
continue # we can't revisit start and finish
if neighbour in self.small_caves:
if small_cave_twice and not used_twice:
# If we've visited this small cave once before
# Then add it again, but "use up" our used_twice
queue.append((neighbour, new_path, True))
continue
assert neighbour in self.large_caves
# If we're here, it's either big caves or neighbours not in the path
queue.append((neighbour, new_path, used_twice))
logger.debug(new_path)
return unique_paths
```

Here’s what’s changed:

- The frontier queue is no longer made up of
`tuples`

that contain`(cave, [path])`

. Now, the`tuple`

has a third value, which is whether we’ve used our`revisit`

option on this path yet. - Obviously, when we pop items off the queue, they will now unpack to three variables, not two, hence:
`cave, path, used_twice`

. - Now, when we’re looking at neighbouring caves and we’ve found a cave already in the path, we need to do some additionally checking:
- Is it a small cave, and are we allowed to revisit a small cave, and have we already used our revisit? If yes, yes, no, then add this adjacent cave to the queue. But when we add it, flag that we’re using up our
`revisit`

option.

- Is it a small cave, and are we allowed to revisit a small cave, and have we already used our revisit? If yes, yes, no, then add this adjacent cave to the queue. But when we add it, flag that we’re using up our

And to run it:

```
# Part 2
unique_paths = graph.get_paths_through(small_cave_twice=True)
logger.info("Part 2: unique paths count=%d", len(unique_paths))
```

The output for both parts looks like this:

```
09:41:30.114:INFO:__main__: Part 1: unique paths count=5228
09:41:30.801:INFO:__main__: Part 2: unique paths count=131228
09:41:30.817:INFO:__main__: Execution time: 0.7296 seconds
```

This solution uses NetworkX. This is a pre-built library that allows us to create, manipulate, interrogate and render graphs. It can take a lot of the pain out of building graphs. It also has cool methods that allow us to do things like: *find the shortest path from a to b*.

First, we need to **install NetworkX**:

```
py -m pip install networks
```

And now our Python setup:

```
import logging
from pathlib import Path
import time
from collections import deque
import matplotlib.pyplot as plt
import networkx as nx
logging.basicConfig(format="%(asctime)s.%(msecs)03d:%(levelname)s:%(name)s:\t%(message)s",
datefmt='%H:%M:%S')
logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)
SCRIPT_DIR = Path(__file__).parent
INPUT_FILE = "input/input.txt"
# INPUT_FILE = "input/sample_input.txt"
RENDER = True
OUTPUT_DIR = Path(SCRIPT_DIR / "output/")
OUTPUT_FILE = Path(OUTPUT_DIR / "cave_graph.png") # where we'll save the animation to
```

Notes:

- I’ve decided to use
`pathlib.Path`

to handle operating system folder/file locations, rather than the`os`

module. This was introduced in Python 3.4, and is considered a better way of manipulating paths, than the`os`

approach. Of note: you can use the`/`

operator for joining paths, which is much more intuitive. - I’ve imported
`networkx as nx`

. - I’ve imported
`matplotlib.pyplot as plt`

.**Matplotlib**is a library for creating static, animated and interactive visualisations in Python, including graphs, charts and 3D models.

Our `CaveGraph`

is a little shorter to implement:

```
class CaveGraph():
""" Stores pairs of connected caves, i.e. edges.
Determines all caves from supplied edges.
Determines which caves are small, and which are large.
Creates a lookup to obtain all caves linked to a given cave.
Finally, knows how to determine all unique paths from start to end,
according to the rules given. """
START = "start"
END = "end"
def __init__(self, edges:set[tuple[str, str]]) -> None:
""" Takes a set of edges, where each edge is in the form (a, b) """
self.start = CaveGraph.START
self.end = CaveGraph.END
self._graph = nx.Graph() # Internal implementation of the graph
self._graph.add_edges_from(edges) # Build the graph
self._small_caves: set[str] = set()
self._large_caves: set[str] = set()
self._categorise_caves() # populate the empty fields
assert self.start in self._graph, "Start needs to be mapped"
assert self.end in self._graph, "Finish needs to be mapped"
def _categorise_caves(self):
""" Build a set of all caves from the edges.
This will also initialise small_caves and large_caves """
for edge in self.edges:
for cave in edge:
if cave not in (self.start, self.end):
if cave.islower():
self._small_caves.add(cave)
else:
self._large_caves.add(cave)
@property
def edges(self) -> tuple[str,str]:
""" All the edges. An edge is one cave linked to another. """
return self._graph.edges
@property
def nodes(self):
return self._graph.nodes
@property
def small_caves(self):
""" Caves labelled lowercase. Subset of self.caves. """
return self._small_caves
@property
def large_caves(self):
""" Caves labelled uppercase. Subset of self.caves. """
return self._large_caves
def get_paths_through(self, small_cave_twice=False) -> set[tuple]:
""" Get all unique paths through from start to end, using a BFS.
Args:
small_cave_twice (bool, optional): Whether we can
visit a small cave more than once. Defaults to False.
"""
start = (self.start, [self.start], False) # (starting cave, [path with only start], used twice)
queue = deque()
queue.append(start)
unique_paths: set[tuple] = set() # To store each path that goes from start to end
while queue:
# If we popleft(), we do a BFS. If we simply pop(), we're doing a DFS.
# Since we need to discover all paths, it makes no difference to performance.
cave, path, used_twice = queue.popleft() # current cave, paths visited, twice?
if cave == self.end: # we've reached the end of a desired path
unique_paths.add(tuple(path))
continue
for neighbour in self._graph[cave]:
new_path = path + [neighbour] # Need a new path object
if neighbour in path:
# big caves fall through and can be re-added to the path
if neighbour in (self.start, self.end):
continue # we can't revisit start and finish
if neighbour in self.small_caves:
if small_cave_twice and not used_twice:
# If we've visited this small cave once before
# Then add it again, but "use up" our used_twice
queue.append((neighbour, new_path, True))
continue
assert neighbour in self.large_caves
# If we're here, it's either big caves or neighbours not in the path
queue.append((neighbour, new_path, used_twice))
logger.debug(new_path)
return unique_paths
```

Most of it is the same. But:

- We simply store the
`_graph`

as an`nx.Graph()`

object. - We can initialise the graph using
`nx.Graph.add_edges_from(edges)`

, which takes all the edges, and establishes*all*the nodes, and*all*the adjacencies. So, we don’t need:- Our own adjacency dictionary.
- Our own
`_determine_caves()`

method.

To solve for both parts, the code is nearly identical:

```
input_file = Path(SCRIPT_DIR, INPUT_FILE)
with open(input_file, mode="rt") as f:
edges = set(tuple(line.split("-")) for line in f.read().splitlines())
graph = CaveGraph(edges) # Create graph from edges supplied in input
logger.debug("Nodes=%s", graph.nodes)
logger.debug("Edges=%s", graph.edges)
# Part 1
unique_paths = graph.get_paths_through()
logger.info("Part 1: unique paths count=%d", len(unique_paths))
# Part 2
unique_paths = graph.get_paths_through(small_cave_twice=True)
logger.info("Part 2: unique paths count=%d", len(unique_paths))
```

Now let’s use **NetworkX**, in combination with **Matplotlib**, to render the output as an **image**.

Just add this to our `CaveGraph`

class:

```
def render(self, file):
_ = plt.subplot(121)
pos = nx.spring_layout(self._graph)
# set colours for each node in the array, in the same order as the nodes
colours = []
for node in self.nodes:
if node in (CaveGraph.START, CaveGraph.END):
colours.append("green")
elif node in self.large_caves:
colours.append("blue")
else:
colours.append("red")
nx.draw(self._graph, pos=pos, node_color=colours, with_labels=True)
dir_path = Path(file).parent
if not Path.exists(dir_path):
Path.mkdir(dir_path)
plt.savefig(file)
```

This code:

- Defines a Pyplot plot area
- Uses NetworkX’s built in
`spring_layout()`

method to create a set of node positions to be rendered, saved as`pos`

. - We then create a
`list`

of`colours`

, with the same number of items as`pos`

. We use this to colour the nodes, based on their type, e.g. start/end, small cave, and big cave. - We then render the plot, using
`nx.draw()`

, passing in the graph, the positions, the colours, and turning on labels. - Finally, we save the rendering as a file, creating the path if it doesn’t exist.

We call it like this:

```
if RENDER:
graph.render(OUTPUT_FILE)
```

I’m saving the file as a .ping. I’ve used the `RENDER`

constant so that I can easily turn rendering on and off.

The rendered graph looks like this for the sample data:

And it looks like this for actual data: