Newsletter

OOP in Python, part 10: Organizing classes

MP 50: With so many different kinds of methods, how do you keep classes organized?

Eric Matthes

Sep 7, 2023 — 10 min read

Note: This is the tenth post in a series about OOP in Python. The previous post discussed the use of helper methods in classes. The next post focuses on inheritance.

We've discussed a number of different kinds of methods so far in this series. With the variety of types of methods you can write, and the complexity of problems that people tackle in programming projects, classes can grow quite large. How do you keep a growing set of methods in a class organized?

Python allows you to put your methods in any order you like, so there are no hard rules. There are a number of common approaches people use to organize larger classes, however. In this post we'll discuss some of these possibilities, and look at how a class in a real-world project is organized.

Organizational principle: Reading your code

When considering how to organize the methods in a class, it’s helpful to think about the different reasons people might look at the code you’re working on. You should use an organization system that makes your class easy for you to work with, but be aware of how your choices will affect others as well.

People will read your code in order to:

Develop it further. Some people want to read your code in order to help you develop the project further. They may be coworkers, or they may be fellow contributors in an open project.
Fix bugs. Again, whether in paid work or in an open project, people will look at your code if they find a bug and think they might be able to identify the source.
Understand what they can do with it. You should have documentation for your project that tells people how to use it. But documentation in many projects isn’t always thorough, especially for newer projects. Early adopters tend to be people willing to look into a codebase to figure out how it works.
Understand how it works. Many people are just plain curious, and one of the best ways to learn is to read the code in the projects we use.
Decide how much to trust your project. People want to know that your project will do what it claims to, and not do things it doesn’t claim to. For example, people want to ensure that your project won’t accidentally (or intentionally) corrupt their files.
Assess your skill as a programmer. This is especially true for people who are considering hiring you. Not all of our projects fully represent our skills and abilities, but if you’re pointing people to one of your projects while job hunting, you’ll want it to be well-organized and easy to read.

However you decide to organize your classes, your overall codebase should be approachable to people with all these different motivations.

Organizational principle: Documentation

It’s also important to recognize that many sources of auto-generated documentation will retain the order of your methods when they parse your codebase. This isn’t just limited to calling help() on a class. Tools like Sphinx and Read the Docs will also show your methods in the same order they appear in the source file. IDEs that show documentation when you hover on an instance will often do so as well.

Start with `init()`

One of the most consistent organizational patterns in Python classes is placing __init__() at the top of the class. The first thing people usually want to know about a class is how to make an instance from the class, and __init__() governs that. If your class happens to override __new__(), that method may appear before __init__().

Public methods followed by supporting helper methods

A large class typically has a number of public methods, and a number of helper methods. Some people like to place a public method first, followed by any helper methods that are called by that public method.

This can be helpful during the developmental phase of a project, when you’re still actively working on all these methods. It can be nice to not have to scroll back and forth in a long file to work on related methods, or have multiple panes in an editor open to different sections of the same file.

One drawback to this approach is that helper methods can be called by multiple public methods. This makes it unclear where to put some helper methods, and it means you end up scrolling or opening multiple editor panes to see relevant parts of the class at the same time.

Public methods first, then helper methods

Some people like to group all the public methods first, and then put the helper methods below all the public methods.

This approach puts the focus on the public interface for the class. People who want to focus on implementation details for a public method will then have to look for the helper methods that the public method calls out to. This can be quite reasonable, especially if you start to have helper methods that are called by multiple public methods.

Group public methods by functionality

Regardless of how you organize your public and helper methods, it’s often a good idea to group your public methods by functionality. For example a fully-built version of the ChessBoard class from the previous post might have all the setup-related methods in one section, all the methods that validate positions and moves in another section, and so forth.

This approach makes good sense for documentation purposes, as end users are likely to be most interested in specific ways of using your class. It also makes sense for people who want to read and contribute to your codebase, because they’re likely to be working on one kind of functionality at a time.

Consider using dividers

If you like, you can visually divide a class into sections based on functionality or method type. Dividers typically look something like this:

# --- Setup methods ---

The programming community is a little divided on whether section headers like these are useful or not. Some people find them helpful when navigating a codebase, and some people consider them visual clutter. I tend to like them, although I try to use them sparingly. If you use dividers like this, try to keep them to one line so they don’t take up more space than they really need.1

Sublime Text screenshot showing dividers for "Display methods" and "Setup methods" — A pseudocode version of `ChessBoard`, with section headers dividing groups of related methods.

Write meaningful docstrings

Once a class has been developed enough that others are likely to look at your source code, make sure you have appropriate docstrings for the different parts of your codebase. I tend to omit docstrings in newsletter posts, because of space considerations. But in professional codebases, they’re quite important.

Module-level docstrings: Write a docstring at the top of the module. At the very least, write a brief description of what’s in the module, and its purpose. Start with a one-line description, which will appear in some auto-generated documentation. Then write enough to help people make sense of the module, and know what to look for in it. This is a good place to include any notes about how the module and the classes in it are organized.

Class-level docstrings: Each class should have its own docstring. The docstring should describe what the class is for and share anything users need to know about how to use it. Consider mentioning the most important attributes and public methods as well.

Method-level docstrings: Each method should also have a docstring. Describe briefly what the method is used for. Then list its arguments and return values, and document any exceptions that the method can raise.

If you’re interested in writing thorough docstrings, make sure to read through the brief PEP 8 section on docstrings, and the much more detailed PEP 257. To see what this can look like in a more specific context, the pandas documentation has its own docstring guide that’s worth skimming through.

It’s okay to have more than one class in a module

In a lot of resources, we tend to focus on one class at a time. Fully-developed projects, however, often contain many classes. You should certainly spread your code across multiple modules in order to keep it organized, but you don’t need to put every class in its own module.

Classes that are related can be placed in the same module. If some classes grow exceptionally large, they can be moved to their own modules if it helps with the overall organization of the project.

A real-world example

The pathlib module is one of my favorite newer parts of the Python standard library, and it’s a module I keep returning to when I want to see an example of how a standard library module is structured. You should read a wide variety of professional codebases to get a sense of how good code is structured, but it’s also helpful to keep revisiting a few specific libraries so you can start to know them in greater detail.

Module-level docstring

Let’s look at some sections of pathlib, in order. This should give us a good sense of how the module, and a class within it, are organized. Here’s the module-level docstring:

"""Object-oriented filesystem paths.

This module provides classes to represent abstract paths and concrete
paths with operations that have semantics appropriate for different
operating systems.
"""

Notice the one-line description at the very beginning of the module. This is one of the shorter module-level docstrings I’ve seen, but it still gives people a clear sense of what to expect in this module.

Class-level docstring

Let’s focus on the Path class, as that’s one of the most common classes in this module that people want to create instances of. It’s almost 700 lines long, so I’m curious to see how a class that long is organized.

Here’s the definition of Path, and the class-level docstring:

class Path(PurePath):
    """PurePath subclass that can make system calls.

    Path represents a filesystem path but unlike PurePath,
    also offers methods to do system calls on path objects.
    Depending on your system, instantiating a Path will
    return either a PosixPath or a WindowsPath object.
    You can also instantiate a PosixPath or WindowsPath directly,
    but cannot instantiate a WindowsPath on a POSIX system
    or vice versa.
    """

I’ve adjusted the line endings to fit Substack’s code blocks, but the actual docstring is written as a single-line description, followed by a single paragraph describing the kinds of things you can do with Path objects.

Notice that this docstring doesn’t follow the recommendations I shared earlier; it doesn’t mention any specific attributes or methods. It’s important to recognize that even well-used codebases don’t follow every recommendation, and you shouldn’t expect yourself to either. I think it’s pretty reasonable to keep this docstring brief, as there are a lot of public methods in this class. Running help(Path) will show you all the public methods, and the documentation for pathlib.Path covers this information quite thoroughly as well.

First methods listed

Interestingly, Path does not place __init__() as the first method:

class Path(PurePath):
    ...

    def stat(self, *, follow_symlinks=True):
        ...
        return os.stat(self, follow_symlinks=follow_symlinks)

    def lstat(self):
        ...
        return self.stat(follow_symlinks=False)

The Path class starts with two public methods, stat() and lstat(). These are public methods, but they’re also called in specific ways by many of the methods that follow.

class Path(PurePath):
    ...

    # Convenience functions for querying the stat results

    def exists(self, *, follow_symlinks=True):
        ...

    def is_dir(self, *, follow_symlinks=True):
        ...

    def is_file(self, *, follow_symlinks=True):
        ...

Next we see a single divider comment, letting us know that the following methods are meant to make working with paths more “convenient”. This seems accurate; for people unfamiliar with the technical aspects of file systems, methods like path.exists() and path.is_file() seem a lot more intuitive than methods like path.stat() and path.lstat().

Helper methods

Most of the methods in Path are public. The few helper methods seem to be tied to specific public methods, so they simply follow those methods. Here are a couple public methods that appear later in the class, and the helper method that follows immediately after:

class Path(PurePath):
    ...

    def glob(self, pattern, *, case_sensitive=None,
            follow_symlinks=None):
        """Iterate over this subtree and yield all existing files
       (of any kind, including directories) matching the given
       relative pattern.
        """
        sys.audit("pathlib.Path.glob", self, pattern)
        return self._glob(pattern, case_sensitive, follow_symlinks)

    def rglob(self, pattern, *, case_sensitive=None,
            follow_symlinks=None):
        """Recursively yield all existing files (of any kind,
        including directories) matching the given relative pattern,
        anywhere in this subtree.
        """
        sys.audit("pathlib.Path.rglob", self, pattern)
        return self._glob(f'**/{pattern}', case_sensitive,
                follow_symlinks)

    def _glob(self, pattern, case_sensitive, follow_symlinks):
        ...

There are two public methods here, glob() and rglob(). These methods are used to find all the files and directories within a specific path that match a given pattern. For example, you can use path.glob("*.png") to find all the .png image files in a given path. Both of these methods are shown in their entirety; they each consist of just two lines of code.

The helper method _glob(), however, is about 70 lines long. This approach of having two short public methods followed by one long helper method makes the public interface clear, while keeping implementation details tucked away in a helper method. Since this helper method is only called by these two public methods, it makes sense to place it immediately after those public methods.

The rest of the class consists of a bunch more public methods, with __init__() and __new__() in the middle of the class. I’m not sure why this is the case. I do know that Path inherits from a more abstract PurePath class, and that other OS-specific classes such as PosixPath and WindowsPath inherit from Path. This may have something to do with where __init__() and __new__() are placed in this class.

Conclusions

With no hard rules about how classes need to be organized, it’s up to developers to come up with an organization system that makes sense for each project. You don’t need to reinvent the wheel when figuring out how to organize your classes. Keep the ideas discussed here in mind, and as you read through the codebases of mature projects, keep an eye out for how the classes are organized overall.

Choose an organization system for each class you write, and be consistent about using that system. Consider stating the organizational principles you’re prioritizing in the module- or class-level docstring. Also, don’t be afraid to reorganize your codebase as it evolves. What made sense for an early version of your project might not make sense as it starts to gain a wider set of users and contributors.

I think people tend to get annoyed by dividers that are too heavy visually, like this:
```
#########################
#                       #
# --- Setup methods --- #
#                       #
#########################
```
I’ve seen some projects where the dividers are ASCII art. It’s cute the first time you see it, but can easily get distracting and annoying when you actually work in the codebase on a regular basis. ↩

Debugging in Python, part 5: Working through multiple bugs

Debugging in Python, part 4: Bugs in multi-file projects

Validating a new project

Know two ways