Thinking about architecture
MP 117: How often do you write a new, moderately complex library from scratch?
Note: Our power was just restored yesterday afternoon, so we ended up going 12 days without power or running water. Internet service is still a week or two away. I've been able to get back to some writing and programming work on my laptop and phone, but I probably won't be able to finish the Django from first principles series for another week or two. Thank you for your patience if you've been looking forward to the conclusion of that series. I also hope to get back to publishing at the same time every Thursday starting next week.
I like to think of myself as a pretty competent programmer, but one area I'm admittedly weak in is software architecture. Most of my projects have one or more end users, but most of those users aren't developers, so the structure of the project is only ever visible to me.
My latest project is a more complex library than I'm used to building from scratch. It has evolved beyond its original scope, in a good way. But it's also outgrown its original proof-of-concept architecture. Now that I'm working towards a 1.0 release, I'm facing some interesting architecture choices that I don't usually have to deal with.
A project that acts on another developer's project
The project I'm working on relates to Django, but this post is not specific to Django at all. The project is called django-simple-deploy. When someone has a Django project that works on their own system, django-simple-deploy automatically configures their project for deployment to the hosting platform of their choice. A lot of people would like to see a stable version of this project, because deployment is a famously difficult problem in web development.
An interesting architectural challenge is that this project modifies another user's project, and then runs commands on that user's system to manage their project. So I'm writing code that aims to modify another developer's code. To add to the complexity, the project is evolving to support plugins. Soon, other developers will be able to write plugins that extend the behavior of django-simple-deploy. These plugins will add support for hosting platforms that I'm less familiar with, or offer different approaches to deployment for a given platform.
This means the project can be used in a fairly wide variety of ways. There are some aspects of the codebase that only I (or other contributors to the project) will see, and some aspects that plugin authors will interact with. For the purposes of this post, we'll focus on how plugin authors interact with the project.
Inside and outside, internal and public
The core of the django-simple-deploy codebase is a class called Command
, in a file called simple_deploy.py. Originally almost all the code lived in the Command
class. During initial development, when I wasn't sure if the idea of automating deployments would work at all, it was helpful for all functions in the project to have access to all other information in the project.
For example when I wanted to write a function for adding a new file to the user's Django project, it was easier if I could just access self.project_root
:
class Command(BaseCommand): def __init__(self): ... def add_file(self, filename, contents): path = self.project_root / filename path.write_text(contents) self.write_output(f"Added file: {path}")
But putting everything in one class becomes problematic after a while. For one thing, the class gets extremely long, and ends up being more difficult to reason about. It also gets much more challenging to test. A standalone function that doesn't depend on the project's state is much easier to test than methods that are tightly bound to a giant class. Much of my work in cleaning up this project has centered around moving pieces from inside the main Command
class, to simpler structures outside that class.
What about functions that need some information from the class?
One of my current open questions is how to deal with functions that plugin authors will need to use, but which also need some information from the class. For example the add_file()
method uses the value of project_root
to build a path to the new file, and it calls the method write_output()
to inform the user that the file was added. The method write_output()
writes to the console when appropriate, and logs most actions unless the plugin author has suppressed logging for that action.
This approach worked great when everything was in one class. But let's look at how a plugin author focusing on deployment to Fly.io might add a file to the user's project:
class FlyDeployer: def __init__(self, command_instance): self.cmd = command_instance ... def add_dockerfile(self): ... self.cmd.add_file("dockerfile", contents)
The plugin author creates a class called FlyDeployer
. When creating an instance of FlyDeployer
, the Command
class passes a reference to itself, which is assigned self.cmd
in FlyDeployer
. Then methods like add_file()
can be called through self.cmd
.
This approach works, but it's not very appealing. I don't really like that a plugin author has to interact with core django-simple-deploy code. There's a lot in the overall codebase that's hard to reason about when you're looking at it for the first time. I want to attract plugin authors who know a particular hosting platform well, without requiring them to develop a deep understanding of the django-simple-deploy codebase.
What I'd really like to do is let plugin authors import some utilities, and call simple utility functions:
from utils import plugin_utils class FlyDeployer: def __init__(): ... def add_dockerfile(self): ... plugin_utils.add_file("dockerfile", contents)
This would be great, but the solution isn't quite that simple. The utility function add_file()
needs access to an instance of Command
. It needs access to attributes like project_root
, and it needs to be able to call methods like write_output()
. It may also need access to things I'm not mentioning in this post such as what OS the target project is running on, and a number of CLI options that may have been passed by the user.
Key questions
I have a couple important questions to answer. Getting these right means I have a project that's complex where it needs to be, and simple where it can be:
How do I write a utility function that has access to all the attributes of theCommand
class, without being part of theCommand
class?
How do I put all the utility functions that plugin authors will use into just one place, but still let the Command
class use those functions?
I don't want plugin authors to have to read all of the django-simple-deploy codebase when they're working on new plugins. I do want plugin authors to be able to look at the code of the utility functions they're using. They shouldn't have to most of the time, if the documentation is good. But I definitely want them to be able to review the code, without diving into the overall architecture of the overall project.
My solution
What I've come around to is writing a module called plugin_utils.py. Any function that a plugin author might need to use is placed in this module. (If it gets too large this can become a directory called plugin_utils.)
The module is divided into two sections. The first contains functions that don't require an instance of the Command
class. Here's an example:
def get_template_string(template_path, context): """Given a template and context, return contents as a string.""" my_engine = Engine() template = my_engine.from_string(template_path.read_text()) return template.render(Context(context))
This function takes a path to a template, and a context dictionary. It returns a single string, containing the rendered template. Most importantly, this function doesn't depend on state at all. It's easy for plugin authors to read, and it's easy for me to test. There's no complex logic here, other than understanding how Django's templating system works.
The second section contains functions that do require an instance of the Command
class. Here's the current version of add_file()
:
def add_file(sd_command, filename, contents): """Add a new file to the project.""" # Check if file already exists. path = sd_command.project_root / filename ... # File does not exist, or we are free to overwrite it. path.write_text(contents) msg = f"\n Wrote {path.name} to {path}" sd_command.write_output(msg)
This example shows why a utility function like add_file()
is needed. It checks if the file already exists; if it does, it gets the user's permission before overwriting the existing file. It then writes output to the appropriate places documenting what was added to the user's project.
In regards to architecture, this function requires that the plugin author pass an instance of the Command
class to the function as its first argument. Here's what that looks like in an actual plugin:
class FlyDeployer: ... def add_dockerfile(self): ... plugin_utils.add_file(self.cmd, "dockerfile", contents)
Passing around instances of Command
isn't ideal, but from everything I've considered it seems like a worthwhile level of complexity, with an acceptable amount of boilerplate. Plugin authors just have to be aware that self.cmd
provides access to a bunch of information about the project, but they don't have to dig into the code in Command
. They can easily look at the code for plugin_utils.add_file()
, and understand exactly what django-simple-deploy is doing each time they ask it to add a file to the user's project.
Conclusions
There are a number of other possible approaches to organizing this code. For example, I could keep functions like add_file()
in the Command
class, and put all public functions that the plugins use in one part of the class. I could take that a step further and pull all the public functions into a separate class, and use a mix of inheritance and composition to build the Command
class. That way all plugin-focused code would be in one module, separate from the Command
class.
An approach like this might be considered more elegant. But my goal is to come up with a structure that works, and makes sense to people who aren't already familiar with the entire django-simple-deploy codebase. The curious developer in me wants to try an approach like this. But the practical maintainer and product designer in me is much more interested in selecting an approach that works for the project's full range of users.
I'm not claiming that my approach is the best one you could take with this project. But I am making the case that it's a reasonable approach for this project, balancing a number of different goals and constraints. My highest priorities are making it easy for plugin authors to write and maintain plugins, and making it as simple as possible for Django developers to use the overall project. My needs are secondary to both of these needs.
Project architecture is not my strongest area as a programmer, so if you have questions or feedback please share it. I'm quite open to suggestions in the pre-1.0 phase of this project.