Dazbo's Advent of Code solutions, written in Python
Day 7: No Space Left On Device
classlist comprehensionlambdaassertionrecursiongenerator
This was the first challenge this year that I found a little bit tricky. It took me longer than it should have… Probably 90 minutes in total. I made a couple of errors along the way that I needed to debug. And I possibly overengineered it!
But oh well… You get bad days!
Today’s challenge is basically about writing a program that can create and navigate a file system directory tree.
The input data looks something like this:
$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
We’re told:
$
are commands. These are the only commands we need to deal with:
$ cd <some_dir>
- change to the specified directory$ cd ..
- move up one directory$ ls
- list the current directory$
, then we must be listing a directory’s contents. I.e. the previous command was ls
. Directory listings only contain two different types of information:
dir <some_dir>
- A subdirectory, within this directory<size> <some_file>
- A file, and its size.Find all of the directories with a total size of at most 100000. What is the sum of the total sizes of those directories?
Okay, here’s my solution…
First, I create a dataclass to represent a File
object:
@dataclass(frozen=True)
class File:
"Has name and size"
name: str
size: int
Nothing new there!
Then I create a class to represent a Directory
:
class Directory:
""" Represents a file system directory object. Has parent dir (if not root), subdirs, and files.
Knows how to return ALL directories and subdirectories.
Knows how to return total size occupied by this directory and all subdirectories. """
def __init__(self, name: str) -> None:
self._name = name
self._files: list[File] = [] # files in this dir
self._dirs: list[Directory] = [] # directories in this dir
@property
def name(self):
return self._name
def add_file(self, a_file: File):
""" Add a File to this directory """
self._files.append(a_file)
def add_directory(self, a_dir: Directory):
""" Add a Directory to this directory. Set THIS directory to be its parent dir. """
self._dirs.append(a_dir)
a_dir.parent_dir = self
@property
def parent_dir(self):
""" Get the parent directory of this dir. """
return self._parent_dir
@parent_dir.setter
def parent_dir(self, a_dir: Directory):
self._parent_dir = a_dir
@property
def directories(self):
""" Return directories at THIS level. """
return self._dirs
def get_directory(self, name: str) -> Directory:
""" Return a directoy by name, at THIS level. """
return next(dir for dir in self.directories if dir.name == name)
def get_all_dirs(self) -> list[Directory]:
""" Get ALL directories at this level and below. """
all_dirs = []
for a_dir in self.directories:
all_dirs.extend(a_dir.get_all_dirs())
all_dirs.extend(self.directories)
return all_dirs
@property
def size(self):
""" The sum of the files in this dir, as well as the sum of all subdirs. """
return sum(file.size for file in self._files) + sum(dir.size for dir in self._dirs)
def __repr__(self) -> str:
return f"{self.__class__.__name__}(name={self.name}, size={self.size}, dirs={len(self.directories)})"
There’s quite a lot going on here!
__init__()
method sets the name of this directory, and creates empty lists to store any files and any subdirectores.name
property.File
to this Directory
. It simply adds the file to the self._files list
.Directory
. Whenever we create a new Directory
object, we’ll set the parent directory to the be the Directory
we’re currently in.Directory
to this Directory
.
Directory
to the self._dirs list
.Directory
, and sets it to be self
.Directory
object given a name. We’ll use this when we cd
from a directory to one of its subdirectories. This is actually done using a generator. A generator looks a lot like a list comprehension, except that we use ()
rather than []
. A generator behaves a lot like an iterator, except it returns one-item-at-a-time, on demand. Here, I’m using it, in conjunction with the next
keyword, to return just the first item of a generator. This is how I return a single Directory
from list
, having supplied a name.Directory
. I.e. all Directory
objects in the current directory, but also all Directory
objects in any subdirectories, and so on. Thus, this method is recursive.size
property. It works by adding up the size of all the File
objects in this Directory
, but then recurses into each subdirectory and adds up the files there too. Yes, more recursion.__repr__()
method that we can use to print the value of a given Directory
. This is really useful for debugging!Oh, one last thing. Because the Directory
class refers to the Directory
class in its own code (in various methods), we need to add this line to prevent the linter from moaning that Directory
has not yet been defined:
from __future__ import annotations
Next, we need to parse our list of command line instructions!
Here’s my code to do that:
def fs_parse(instructions: list[str]) -> Directory:
""" Processes instructions, builds directory tree, and returns the root directory.
Lines starting with $ are commands
$ cd ..
$ cd some_dir
$ ls
Else, lines are listings, which show either:
sz some_file
dir some_dir
"""
root_dir = Directory("/")
current_dir = root_dir
for line in instructions:
if line.startswith("$"): # this line is a command
cmd_line = line.split()
cmd = cmd_line[1]
if cmd == "ls":
continue # we're now in directory list mode, so skip to next line
if cmd == "cd": # Changing directory
arg = cmd_line[2]
if arg == "..": # Going back up a level
assert current_dir.parent_dir, "We're not at root"
current_dir = current_dir.parent_dir
else: # Change to named directory
if arg != '/': # Changing to named dir. We need find the directory in our list.
# It is possible to have multiple directories with the same name.
# But directory names are unique within the current directory.
current_dir = current_dir.get_directory(arg)
else: # Change to root. Only happens once at the top of the instructions.
current_dir = root_dir
else:
assert False, "There is no other valid command!"
else: # we must be dir listing
ls_line = line.split()
if ls_line[0] == "dir": # add a new directory
new_dir = Directory(ls_line[1])
current_dir.add_directory(new_dir) # add as subdirectory to current dir
else: # must be a file
file = File(ls_line[1], size=int(ls_line[0]))
current_dir.add_file(file)
return root_dir
The code is pretty well documented, so you should be able to follow it without much trouble. A couple of notes…
Directory
for the root
. All directories and files that we find will be - either directly or indirectly - under the root
. At the end of the function, we return just this root Directory
object.$
, or it is processing ls
output.Directory
to be named Directory
under current, or to be the parent of current. In either case, the current Directory
always holds a reference to the appropriate object.Directory
objects to an existing Directory
, I don’t have to do anything to set the new directory’s parent. That’s because this is taken care of in the Directory
class itself. As it should be!Okay, believe it or not, we’ve done all the hard work now! Solving the problem is now, finally, quite trivial!
# Part 1
all_dirs = root_dir.get_all_dirs()
print(f"All dirs count={len(all_dirs)}.")
# Find all the directories smaller than the target size, and add up their sizes
small_dirs = [a_dir for a_dir in all_dirs if a_dir.size <= MAX_SZ]
print(f"\nPart 1: Small dirs total = {sum(dir.size for dir in small_dirs)}")
How does this work?
sum()
function to add up the sizes of all these small directories.Phew!!
root
directory reports back.Find the smallest directory that, if deleted, would free up enough space on the filesystem to run the update. What is the total size of that directory?
Fortunately, because of the hard work we’ve already done, this is really easy to do:
# Part 2
unused_space = FS_SZ - root_dir.size # Total FS size, minus current used
extra_free_req = FREE_REQ - unused_space
print(f"\nCurrent nused space={unused_space}; extra space required={extra_free_req}")
# Find all directories that would liberate enough space
# Then we want the smallest that would be big enough.
dirs_big_enough = [a_dir for a_dir in all_dirs if a_dir.size >= extra_free_req]
smallest_big_dir = min(dirs_big_enough, key=lambda x: x.size)
print(f"Part 2: Smallest directory we can delete={smallest_big_dir.name}: {smallest_big_dir.size}")
min()
function, and setting the key of the min()
function to be size
property of our Directory
object. I do this with a lambda function.Here’s the final code:
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import time
SCRIPT_DIR = Path(__file__).parent
# INPUT_FILE = Path(SCRIPT_DIR, "input/sample_input.txt")
INPUT_FILE = Path(SCRIPT_DIR, "input/input.txt")
FS_SZ = 70000000
FREE_REQ = 30000000
MAX_SZ = 100000
@dataclass(frozen=True)
class File:
"Has name and size"
name: str
size: int
class Directory:
""" Represents a file system directory object. Has parent dir (if not root), subdirs, and files.
Knows how to return ALL directories and subdirectories.
Knows how to return total size occupied by this directory and all subdirectories. """
def __init__(self, name: str) -> None:
self._name = name
self._files: list[File] = [] # files in this dir
self._dirs: list[Directory] = [] # directories in this dir
@property
def name(self):
return self._name
def add_file(self, a_file: File):
""" Add a File to this directory """
self._files.append(a_file)
def add_directory(self, a_dir: Directory):
""" Add a Directory to this directory. Set THIS directory to be its parent dir. """
self._dirs.append(a_dir)
a_dir.parent_dir = self
@property
def parent_dir(self):
""" Get the parent directory of this dir. """
return self._parent_dir
@parent_dir.setter
def parent_dir(self, a_dir: Directory):
self._parent_dir = a_dir
@property
def directories(self):
""" Return directories at THIS level. """
return self._dirs
def get_directory(self, name: str) -> Directory:
""" Return a directoy by name, at THIS level. """
return next(dir for dir in self.directories if dir.name == name)
def get_all_dirs(self) -> list[Directory]:
""" Get ALL directories at this level and below. """
all_dirs = []
for a_dir in self.directories:
all_dirs.extend(a_dir.get_all_dirs())
all_dirs.extend(self.directories)
return all_dirs
@property
def size(self):
""" The sum of the files in this dir, as well as the sum of all subdirs. """
return sum(file.size for file in self._files) + sum(dir.size for dir in self._dirs)
def __repr__(self) -> str:
return f"{self.__class__.__name__}(name={self.name}, size={self.size}, dirs={len(self.directories)})"
def main():
with open(INPUT_FILE, mode="rt") as f:
data = f.read().splitlines()
root_dir = fs_parse(data)
print(root_dir)
# Part 1
all_dirs = root_dir.get_all_dirs()
print(f"All dirs count={len(all_dirs)}.")
# Find all the directories smaller than the target size, and add up their sizes
small_dirs = [a_dir for a_dir in all_dirs if a_dir.size <= MAX_SZ]
print(f"\nPart 1: Small dirs total = {sum(dir.size for dir in small_dirs)}")
# Part 2
unused_space = FS_SZ - root_dir.size # Total FS size, minus current used
extra_free_req = FREE_REQ - unused_space
print(f"\nCurrent nused space={unused_space}; extra space required={extra_free_req}")
# Find all directories that would liberate enough space
# Then we want the smallest that would be big enough.
dirs_big_enough = [a_dir for a_dir in all_dirs if a_dir.size >= extra_free_req]
smallest_big_dir = min(dirs_big_enough, key=lambda x: x.size)
print(f"Part 2: Smallest directory we can delete={smallest_big_dir.name}: {smallest_big_dir.size}")
def fs_parse(instructions: list[str]) -> Directory:
""" Processes instructions, builds directory tree, and returns the root directory.
Lines starting with $ are commands
$ cd ..
$ cd some_dir
$ ls
Else, lines are listings, which show either:
sz some_file
dir some_dir
"""
root_dir = Directory("/")
current_dir = root_dir
for line in instructions:
if line.startswith("$"): # this line is a command
cmd_line = line.split()
cmd = cmd_line[1]
if cmd == "ls":
continue # we're now in directory list mode, so skip to next line
if cmd == "cd": # Changing directory
arg = cmd_line[2]
if arg == "..": # Going back up a level
assert current_dir.parent_dir, "We're not at root"
current_dir = current_dir.parent_dir
else: # Change to named directory
if arg != '/': # Changing to named dir. We need find the directory in our list.
# It is possible to have multiple directories with the same name.
# But directory names are unique within the current directory.
current_dir = current_dir.get_directory(arg)
else: # Change to root. Only happens once at the top of the instructions.
current_dir = root_dir
else:
assert False, "There is no other valid command!"
else: # we must be dir listing
ls_line = line.split()
if ls_line[0] == "dir": # add a new directory
new_dir = Directory(ls_line[1])
current_dir.add_directory(new_dir) # add as subdirectory to current dir
else: # must be a file
file = File(ls_line[1], size=int(ls_line[0]))
current_dir.add_file(file)
return root_dir
if __name__ == "__main__":
t1 = time.perf_counter()
main()
t2 = time.perf_counter()
print(f"Execution time: {t2 - t1:0.4f} seconds")
And the output looks like this:
Directory(name=/, size=49199225, dirs=4)
All dirs count=203.
Part 1: Small dirs total = 1501149
Current nused space=20800775; extra space required=9199225
Part 2: Smallest directory we can delete=jhmvgjrr: 10096985
Execution time: 0.0064 seconds
I’m glad that’s over!!