Dazbo's Advent of Code solutions, written in Python
Breadth First Search (BFS)pathlibgraphsnetworkxmatplotlib
A little bit tricky, this one.
I’ve created two solutions for this:
We’re told we’re navigating a cave system which contains large caves (denoted by uppercase labels) and small caves (denoted by lowercase labels).
The input data looks something like this:
start-A
start-b
A-c
A-b
b-d
A-end
b-end
And if we map out these connections, they can be visualised like this:
start
/ \
c--A-----b--d
\ /
end
We’re asked to find the number of distinct paths that start at start
and end at end
, and don’t visit any small caves more than once.
This is the no frills solution.
Whenever we need to find all the paths from one point to another, a BFS should immediately come to mind as a possible way to solve the problem. The BFS is an algorithm we can always use to find all paths between two points. The tricky addition to this problem is that the small caves are one-way gates.
Nothing new here:
import logging
import os
import time
from collections import defaultdict, deque
logging.basicConfig(format="%(asctime)s.%(msecs)03d:%(levelname)s:%(name)s:\t%(message)s",
datefmt='%H:%M:%S')
logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)
SCRIPT_DIR = os.path.dirname(__file__)
INPUT_FILE = "input/input.txt"
# INPUT_FILE = "input/sample_input.txt"
Now we need to build a graph. In computing (and maths), a graph is a finite set of vertices (aka nodes or points), which are connected by edges (aka links).
If the edges have a direction, they are represented as arrows, and the graph is a directed graph. If the direction doesn’t matter, then edges are represented as lines, and the graph is an undirected graph. An undirected graph looks something like this:
Here, it makes sense to represent our network of caves as an undirected graph, and then to find all the unique paths through it, given the “small caves” constraint.
We start by creating our CaveGraph
class:
class CaveGraph():
""" Stores pairs of connected caves, i.e. edges.
Determines all caves from supplied edges.
Determines which caves are small, and which are large.
Creates a lookup to obtain all caves linked to a given cave.
Finally, knows how to determine all unique paths from start to end,
according to the rules given. """
START = "start"
END = "end"
def __init__(self, edges:set[tuple[str, str]]) -> None:
""" Takes a set of edges, where each edge is in the form (a, b) """
self.start = CaveGraph.START
self.end = CaveGraph.END
self._edges: set[tuple[str, str]] = set(edges)
self._nodes: set[str] = set()
self._small_caves: set[str] = set()
self._large_caves: set[str] = set()
self._determine_caves() # populate the empty fields
# Create lookup (adjacency) dict to find all linked nodes for a node
self._node_map: dict[str, set[str]] = defaultdict(set)
for x,y in edges: # E.g. x, y
self._node_map[x].add(y)
self._node_map[y].add(x)
assert self.start in self._node_map, "Start needs to be mapped"
assert self.end in self._node_map, "Finish needs to be mapped"
@property
def edges(self):
""" All the edges. An edge is one cave linked to another. """
return self._edges
@property
def small_caves(self):
""" Caves labelled lowercase. Subset of self.caves. """
return self._small_caves
@property
def large_caves(self):
""" Caves labelled uppercase. Subset of self.caves. """
return self._large_caves
def _determine_caves(self):
""" Build a set of all caves from the edges.
This will also initialise small_caves and large_caves """
for edge in self._edges:
for cave in edge:
self._nodes.add(cave)
if cave not in (self.start, self.end):
if cave.islower():
self._small_caves.add(cave)
else:
self._large_caves.add(cave)
def _get_adjacent_caves(self, node: str) -> set[str]:
""" Returns the adjacent caves, given a cave input. """
return self._node_map[node]
Things to say about this:
__init__()
method:
set
of pairs of locations. I.e. a set of edges, where each edge is in the form of two nodes (which are caves)._edges
._determine_caves()
, which:
_caves
. We’re using sets, which only store unique values. So if we add the same cave more than once, it doesn’t matter._small_caves
and _large_caves
, respectively._
, where they are only intended to be used by the object, and are not intended to be accessed outside the object.adjacency dictionary
, which is basically a dictionary that maps each node (cave) to every other node (cave) that it is linked to. We do this using a defaultdict(set)
, i.e. a dict that it is initialised with an empty set
for each key. That way, whenever we come across a cave that is linked to some cave x
, we can simply add the new cave to the existing _node_map[x]
set.edges
, small_caves
, large_caves
.Now let’s add the method that will do the hard work. I.e. actually performs the BFS:
def get_paths_through(self) -> set[tuple]:
""" Get all unique paths through from start to end, using a BFS """
start = (self.start, [self.start]) # (starting cave, [path with only start])
queue = deque()
queue.append(start)
unique_paths: set[tuple] = set() # To store each path that goes from start to end
while queue:
# If we popleft(), we do a BFS. If we simply pop(), we're doing a DFS.
# Since we need to discover all paths, it makes no difference to performance.
cave, path = queue.popleft() # current cave, paths visited
if cave == self.end: # we've reached the end of a desired path
unique_paths.add(tuple(path))
continue
for neighbour in self._get_adjacent_caves(cave):
new_path = path + [neighbour] # Need a new path object
if neighbour in path:
# big caves fall through and can be re-added to the path
if neighbour in (self.start, self.end):
continue # we can't revisit start and finish
if neighbour in self.small_caves:
continue # we can't revisit small cave
assert neighbour in self.large_caves
# If we're here, it's either big caves or neighbours not in the path
queue.append((neighbour, new_path))
logger.debug(new_path)
return unique_paths
This method is a fairly standard BFS, as we’ve used previously:
start
tuple
, which has the first value set to our start
cave, and the second value set to a list
that represents the path taken so far. Initially, this list
only contains the start
cave.queue
, which will be our frontier, implemented as a deque
.start
tuple to the queue.We create a set
to store all unique paths found.
end
. If it is:
list
path to a tuple
path when we add it. We need to do this, because sets
only store hashable
objects; tuples
are hashable, but lists
are not.continue
. (We don’t break
, because we want to find all the paths.)We run it like this:
input_file = os.path.join(SCRIPT_DIR, INPUT_FILE)
with open(input_file, mode="rt") as f:
edges = set(tuple(line.split("-")) for line in f.read().splitlines())
graph = CaveGraph(edges) # Create graph from edges supplied in input
# Part 1
unique_paths = graph.get_paths_through()
logger.info("Part 1: unique paths count=%d", len(unique_paths))
Now we’re told we’re allowed to visit any single small cave twice for any given path. Implementing this requires some thinking!!
My approach was to modify the get_paths_through()
method to take a parameter small_cave_twice
which defaults to False
. If this is set to True
, then the method allows the single revisit. Thus, we can use the same method for path parts. Of course, we need to modify the method itself:
def get_paths_through(self, small_cave_twice=False) -> set[tuple]:
""" Get all unique paths through from start to end, using a BFS
Args:
small_cave_twice (bool, optional): Whether we can
visit a small cave more than once. Defaults to False.
"""
start = (self.start, [self.start], False) # (starting cave, [path with only start], used twice)
queue = deque()
queue.append(start)
unique_paths: set[tuple] = set() # To store each path that goes from start to end
while queue:
# If we popleft(), we do a BFS. If we simply pop(), we're doing a DFS.
# Since we need to discover all paths, it makes no difference to performance.
cave, path, used_twice = queue.popleft() # current cave, paths visited, twice?
if cave == self.end: # we've reached the end of a desired path
unique_paths.add(tuple(path))
continue
for neighbour in self._get_adjacent_caves(cave):
new_path = path + [neighbour] # Need a new path object
if neighbour in path:
# big caves fall through and can be re-added to the path
if neighbour in (self.start, self.end):
continue # we can't revisit start and finish
if neighbour in self.small_caves:
if small_cave_twice and not used_twice:
# If we've visited this small cave once before
# Then add it again, but "use up" our used_twice
queue.append((neighbour, new_path, True))
continue
assert neighbour in self.large_caves
# If we're here, it's either big caves or neighbours not in the path
queue.append((neighbour, new_path, used_twice))
logger.debug(new_path)
return unique_paths
Here’s what’s changed:
tuples
that contain (cave, [path])
. Now, the tuple
has a third value, which is whether we’ve used our revisit
option on this path yet.cave, path, used_twice
.revisit
option.And to run it:
# Part 2
unique_paths = graph.get_paths_through(small_cave_twice=True)
logger.info("Part 2: unique paths count=%d", len(unique_paths))
The output for both parts looks like this:
09:41:30.114:INFO:__main__: Part 1: unique paths count=5228
09:41:30.801:INFO:__main__: Part 2: unique paths count=131228
09:41:30.817:INFO:__main__: Execution time: 0.7296 seconds
This solution uses NetworkX. This is a pre-built library that allows us to create, manipulate, interrogate and render graphs. It can take a lot of the pain out of building graphs. It also has cool methods that allow us to do things like: find the shortest path from a to b.
First, we need to install NetworkX:
py -m pip install networks
And now our Python setup:
import logging
from pathlib import Path
import time
from collections import deque
import matplotlib.pyplot as plt
import networkx as nx
logging.basicConfig(format="%(asctime)s.%(msecs)03d:%(levelname)s:%(name)s:\t%(message)s",
datefmt='%H:%M:%S')
logger = logging.getLogger(__name__)
logger.setLevel(level=logging.INFO)
SCRIPT_DIR = Path(__file__).parent
INPUT_FILE = "input/input.txt"
# INPUT_FILE = "input/sample_input.txt"
RENDER = True
OUTPUT_DIR = Path(SCRIPT_DIR / "output/")
OUTPUT_FILE = Path(OUTPUT_DIR / "cave_graph.png") # where we'll save the animation to
Notes:
pathlib.Path
to handle operating system folder/file locations, rather than the os
module. This was introduced in Python 3.4, and is considered a better way of manipulating paths, than the os
approach. Of note: you can use the /
operator for joining paths, which is much more intuitive.networkx as nx
.matplotlib.pyplot as plt
. Matplotlib is a library for creating static, animated and interactive visualisations in Python, including graphs, charts and 3D models.Our CaveGraph
is a little shorter to implement:
class CaveGraph():
""" Stores pairs of connected caves, i.e. edges.
Determines all caves from supplied edges.
Determines which caves are small, and which are large.
Creates a lookup to obtain all caves linked to a given cave.
Finally, knows how to determine all unique paths from start to end,
according to the rules given. """
START = "start"
END = "end"
def __init__(self, edges:set[tuple[str, str]]) -> None:
""" Takes a set of edges, where each edge is in the form (a, b) """
self.start = CaveGraph.START
self.end = CaveGraph.END
self._graph = nx.Graph() # Internal implementation of the graph
self._graph.add_edges_from(edges) # Build the graph
self._small_caves: set[str] = set()
self._large_caves: set[str] = set()
self._categorise_caves() # populate the empty fields
assert self.start in self._graph, "Start needs to be mapped"
assert self.end in self._graph, "Finish needs to be mapped"
def _categorise_caves(self):
""" Build a set of all caves from the edges.
This will also initialise small_caves and large_caves """
for edge in self.edges:
for cave in edge:
if cave not in (self.start, self.end):
if cave.islower():
self._small_caves.add(cave)
else:
self._large_caves.add(cave)
@property
def edges(self) -> tuple[str,str]:
""" All the edges. An edge is one cave linked to another. """
return self._graph.edges
@property
def nodes(self):
return self._graph.nodes
@property
def small_caves(self):
""" Caves labelled lowercase. Subset of self.caves. """
return self._small_caves
@property
def large_caves(self):
""" Caves labelled uppercase. Subset of self.caves. """
return self._large_caves
def get_paths_through(self, small_cave_twice=False) -> set[tuple]:
""" Get all unique paths through from start to end, using a BFS.
Args:
small_cave_twice (bool, optional): Whether we can
visit a small cave more than once. Defaults to False.
"""
start = (self.start, [self.start], False) # (starting cave, [path with only start], used twice)
queue = deque()
queue.append(start)
unique_paths: set[tuple] = set() # To store each path that goes from start to end
while queue:
# If we popleft(), we do a BFS. If we simply pop(), we're doing a DFS.
# Since we need to discover all paths, it makes no difference to performance.
cave, path, used_twice = queue.popleft() # current cave, paths visited, twice?
if cave == self.end: # we've reached the end of a desired path
unique_paths.add(tuple(path))
continue
for neighbour in self._graph[cave]:
new_path = path + [neighbour] # Need a new path object
if neighbour in path:
# big caves fall through and can be re-added to the path
if neighbour in (self.start, self.end):
continue # we can't revisit start and finish
if neighbour in self.small_caves:
if small_cave_twice and not used_twice:
# If we've visited this small cave once before
# Then add it again, but "use up" our used_twice
queue.append((neighbour, new_path, True))
continue
assert neighbour in self.large_caves
# If we're here, it's either big caves or neighbours not in the path
queue.append((neighbour, new_path, used_twice))
logger.debug(new_path)
return unique_paths
Most of it is the same. But:
_graph
as an nx.Graph()
object.nx.Graph.add_edges_from(edges)
, which takes all the edges, and establishes all the nodes, and all the adjacencies. So, we don’t need:
_determine_caves()
method.To solve for both parts, the code is nearly identical:
input_file = Path(SCRIPT_DIR, INPUT_FILE)
with open(input_file, mode="rt") as f:
edges = set(tuple(line.split("-")) for line in f.read().splitlines())
graph = CaveGraph(edges) # Create graph from edges supplied in input
logger.debug("Nodes=%s", graph.nodes)
logger.debug("Edges=%s", graph.edges)
# Part 1
unique_paths = graph.get_paths_through()
logger.info("Part 1: unique paths count=%d", len(unique_paths))
# Part 2
unique_paths = graph.get_paths_through(small_cave_twice=True)
logger.info("Part 2: unique paths count=%d", len(unique_paths))
Now let’s use NetworkX, in combination with Matplotlib, to render the output as an image.
Just add this to our CaveGraph
class:
def render(self, file):
_ = plt.subplot(121)
pos = nx.spring_layout(self._graph)
# set colours for each node in the array, in the same order as the nodes
colours = []
for node in self.nodes:
if node in (CaveGraph.START, CaveGraph.END):
colours.append("green")
elif node in self.large_caves:
colours.append("blue")
else:
colours.append("red")
nx.draw(self._graph, pos=pos, node_color=colours, with_labels=True)
dir_path = Path(file).parent
if not Path.exists(dir_path):
Path.mkdir(dir_path)
plt.savefig(file)
This code:
spring_layout()
method to create a set of node positions to be rendered, saved as pos
.list
of colours
, with the same number of items as pos
. We use this to colour the nodes, based on their type, e.g. start/end, small cave, and big cave.nx.draw()
, passing in the graph, the positions, the colours, and turning on labels.We call it like this:
if RENDER:
graph.render(OUTPUT_FILE)
I’m saving the file as a .ping. I’ve used the RENDER
constant so that I can easily turn rendering on and off.
The rendered graph looks like this for the sample data:
And it looks like this for actual data: