Advent of Code 2018 - Day 6

Useful Links

Concepts and Packages Demonstrated

Manhattan Distance defaultdict sets NamedTuple Lambda Functions

Problem Intro
Part 1
Part 2
Performance Optimization
Results

Problem Intro

Your wrist device beeps urgently with a chronal interference warning and produces a list of coordinates. These coordinates might be dangerous! We need to work out which areas are safe and which aren’t.

The input data looks like this:

1, 1
1, 6
8, 3
3, 4
5, 5
8, 9

Each line represents an x, y coordinate on an infinite 2D grid. We’ll use Manhattan distance (also called taxicab distance) to measure distances between points. The Manhattan distance between two points is simply:

distance = |x1 - x2| + |y1 - y2|

This is the distance you’d travel if you could only move horizontally or vertically on a grid - like a taxi navigating city blocks.

Part 1

What is the size of the largest area that isn’t infinite?

An “area” for a coordinate is defined as all the grid locations that are closest to that specific coordinate using Manhattan distance. If a location is equidistant from multiple coordinates, it doesn’t belong to any area.

The Challenge: Infinite Areas

Some coordinates will have infinite areas because their influence extends forever beyond any reasonable boundary. The key insight is that if a coordinate’s area touches the edge of our bounding box, then it must be infinite.

Why? Because if the area reaches the edge of the box defined by the min/max coordinates, it would continue expanding beyond that box indefinitely.

Solution Strategy

Here’s my approach:

Define a bounding box using the min/max x and y values from all coordinates
For each point in the bounding box, find which coordinate it’s closest to
Group points by their closest coordinate to build areas
Identify infinite areas - any coordinate whose area touches the bounding box edge
Find the largest finite area

Let’s start with some helper functions. I’m using NamedTuple for points, which gives us readable .x and .y access while keeping performance high:

from typing import NamedTuple

class Point(NamedTuple):
    x: int
    y: int

def manhattan_distance(p1: Point, p2: Point) -> int:
    """Calculate Manhattan distance between two points."""
    return abs(p1.x - p2.x) + abs(p1.y - p2.y)

def parse_input(data: list[str]) -> list[Point]:
    """Converts a list of coordinate strings into a list of Point objects."""
    return [Point(*map(int, line.split(","))) for line in data]

def bounding_box(points: list[Point]) -> tuple[Point, Point]:
    """Returns the bounding box defined by the min/max x and y values of all points."""
    min_x = min(point.x for point in points)
    max_x = max(point.x for point in points)
    min_y = min(point.y for point in points)
    max_y = max(point.y for point in points)
    
    return Point(min_x, min_y), Point(max_x, max_y)

def on_bounding_box_edge(point: Point, tl: Point, br: Point) -> bool:
    """Returns True if the point is on the bounding box edge."""
    return point.x in [tl.x, br.x] or point.y in [tl.y, br.y]

A few things to note:

parse_input() uses tuple(map(int, line.split(","))) to convert each line like "1, 1" into a tuple (1, 1)
bounding_box() returns two tuples: top-left (min_x, min_y) and bottom-right (max_x, max_y)
on_bounding_box_edge() checks if a point’s x or y coordinate matches any edge of the box

Now for the main Part 1 logic:

def part1(data: list[str]):
    danger_points = parse_input(data)
    tl, br = bounding_box(danger_points)
    
    # Store the closest danger point and its distance for each point in the bounding box
    distances: dict[Point, tuple[Point | None, int]] = {}

    # Iterate over all points in the bounding box
    for y in range(tl.y, br.y + 1):
        for x in range(tl.x, br.x + 1):
            curr_point = Point(x, y)
            
            # Determine the closest danger coordinate for this point
            for danger_point in danger_points:
                dist = manhattan_distance(curr_point, danger_point)
                if curr_point not in distances:
                    distances[curr_point] = (danger_point, dist)
                else:
                    if dist < distances[curr_point][1]:
                        distances[curr_point] = (danger_point, dist)
                    elif dist == distances[curr_point][1]:
                        distances[curr_point] = (None, dist)  # Tied distance
    
    # Filter out all points associated with multiple danger points (marked as None)
    distances = {point: (danger_point, dist) for point, (danger_point, dist) in distances.items()
                    if danger_point is not None}
    
    # Build areas: group all points by their closest danger point
    points_in_dp_area: defaultdict[Point, set[Point]] = defaultdict(set)
    for point, (danger_point, _) in distances.items():
        points_in_dp_area[danger_point].add(point)
    
    # Identify danger points with infinite areas
    infinite_danger_points = set()
    for danger_point, area_points in points_in_dp_area.items():
        for point in area_points:
            if on_bounding_box_edge(point, tl, br):
                infinite_danger_points.add(danger_point)
                break  # No need to check other points for this danger point
    
    # Filter to only finite areas
    finite_areas = {dp: points for dp, points in points_in_dp_area.items() 
                    if dp not in infinite_danger_points}
    
    # Return the size of the largest finite area
    biggest_area = max(finite_areas.items(), key=lambda item: len(item[1]))
    return len(biggest_area[1])

Let me walk through the key parts:

Finding closest coordinates: For each grid point, we iterate through all danger points and track the closest one. If we find a tie (equal distances), we mark it as None since tied points don’t belong to any area.

Using defaultdict(set): This is perfect for grouping points by their danger coordinate. See my defaultdict guide for more details. Each danger point gets a set of all grid points closest to it.

Infinite area detection: This is the tricky bit! We iterate through each danger point’s area and check if ANY point in that area touches the bounding box edge. If so, that danger point has an infinite area. This is the key insight that makes the solution work correctly.

Finding the maximum: We use max() with a custom key function that extracts the length of the point set for each area. The key argument takes a function that transforms each element before comparison. Here, I’m using a lambda function lambda item: len(item[1]). This anonymous function takes an item (a key-value pair from the dictionary items), and returns the length of the value (the set of points). This tells max() to compare the areas based on their size.

Part 2

What is the size of the region containing all locations which have a total distance to all given coordinates of less than 10000?

Part 2 is actually simpler! Instead of finding areas closest to individual coordinates, we need to find all locations where the sum of distances to ALL coordinates is below a threshold (10000 for the real input, 32 for the sample).

def part2(data: list[str], max_distance: int = 10000):
    danger_points = parse_input(data)
    tl, br = bounding_box(danger_points)
    
    # Count points with total distance less than max_distance
    count = 0
    
    # Iterate over all points in the bounding box
    for y in range(tl.y, br.y + 1):
        for x in range(tl.x, br.x + 1):
            curr_point = Point(x, y)
            
            # Calculate the total distance to all danger points
            total_distance = sum(manhattan_distance(curr_point, dp) for dp in danger_points)
            
            if total_distance < max_distance:
                count += 1
    
    return count

This is much more straightforward:

For each point in the bounding box, calculate the Manhattan distance to every danger point
Sum all those distances
If the sum is less than our threshold, increment the counter

The sum() with a generator expression sum(manhattan_distance(curr_point, dp) for dp in danger_points) is a clean, Pythonic way to calculate the total.

Performance Optimization

Initially, this solution was quite slow - taking about 11.5 seconds to run both parts. But with a simple optimization, I got it down to 0.84 seconds - a 14x speedup!

The Problem: Point Objects vs Tuples

My original implementation used a custom Point dataclass from my common utilities:

@dataclass(frozen=True)
class Point:
    x: int
    y: int
    
    def manhattan_distance_from(self, other: Point) -> int:
        diff = self - other
        return Point.manhattan_distance(diff)

While this is clean and object-oriented, it has significant overhead:

Creating Point objects requires calling __init__
The manhattan_distance_from() method creates an intermediate Point object (the difference)
It then calls asdict() to convert the dataclass to a dictionary
All of this happens millions of times in the nested loops!

The Solution: NamedTuples

I experimented with three approaches:

Custom Dataclass: Clean and OOP, but very slow due to object overhead.
Plain Tuples: Extremely fast, but code is less readable (p[0] vs p.x).
NamedTuple: A great compromise!

By switching to NamedTuple, we get the readability of objects (point.x, point.y) with performance much closer to plain tuples.

Performance Comparison:

Implementation	Part 1 Time	Part 2 Time	Notes
Dataclass	~5.7s	~5.8s	High overhead
Plain Tuple	~1.46s	~0.80s	Fastest, but less readable
NamedTuple	~1.65s	~0.99s	Best Balance

While NamedTuple is slightly slower than raw tuples (about 13-24% slower in this case), it is still significantly faster than the dataclass approach (about 3-5x faster) and makes the code much nicer to work with.

The lesson: in performance-critical code with tight loops, NamedTuple is often a better choice than dataclass if you need immutable data structures, and it’s a worthy trade-off against raw tuples for the added readability.

Results

The output looks like this:

Part 1 soln=17
Execution time: 0.620 seconds
Part 2 soln=16
Execution time: 0.321 seconds

The solution is fast and efficient, handling the real input in under a second total!

‹

Learning Python with Advent of Code Walkthroughs