OOP in Python, part 5: Class methods

MP 42: Dealing with data that applies to multiple instances.

Note: This is the fifth post in a series about OOP in Python. The previous post discussed static methods. The next post covers the __str__() and __repr__() methods.


So far in this series we’ve talked about the __init__() method and static methods. There’s one more special kind of method you should know about: class methods. Class methods are useful when you want to work with information that’s associated with a class, but not associated with one specific instance.

Bonsai trees

Let’s consider a class that represents Bonsai trees:

class BonsaiTree:

    def __init__(self, name, description):
        self.name = name
        self.description = description

    def describe_tree(self):
        msg = f"{self.name}: {self.description}"
        print(msg)

tree = BonsaiTree("Winged Elm")
tree.description = "tall, solid trunk"
tree.describe_tree()

When you create an instance of BonsaiTree you need to provide a name, and you can provide a description as well.

Here’s the output for a single tree:

Winged Elm: tall, solid trunk

This class is what we typically think of in basic OOP: the class lets us create instances, and the entire focus is on the kinds of things we might need to do with individual instances.

But what if there’s some information we want to work with affecting more than one instance?

Winged Elm Bonsai tree.
The North Carolina Arboretum has a really interesting collection of Bonsai trees.

More than one tree

Many Bonsai trees are part of a collection, so let’s add a couple more trees:

class BonsaiTree:
    ...

trees = []

tree = BonsaiTree("Winged Elm")
tree.description = "tall, solid trunk"
trees.append(tree)

tree = BonsaiTree("El Arbol Murcielago")
tree.description = "short, open trunk"
trees.append(tree)

tree = BonsaiTree("Mt. Mitchell")
tree.description = "small mountain forest"
trees.append(tree)

for tree in trees:
    tree.describe_tree()

We make three instances of BonsaiTree, and append each one to the list trees. Now we have three trees:

Winged Elm: tall, solid trunk
El Arbol Murcielago: short, open trunk
Mt. Mitchell: small mountain forest
Bonsai tree named "El Arbor Murcielago".
The variety of trunk shapes and foliage patterns in the overall collection is particularly fun to see.

Tracking instances

Imagine you’re actually using this class to track specimens in an exhibit. You might want to keep track of how many instances of BonsaiTree have been created.

The number of trees that have been added to the collection is information that’s relevant to the class, but it’s not associated with any one specific instance. When you have information like this, it should be stored in an attribute that’s associated with the overall class, not any instance.

Here’s how we do that:

class BonsaiTree:

    num_trees = 0

    @classmethod
    def count_trees(cls):
        msg = f"We have {cls.num_trees} trees in the collection."
        print(msg)

    def __init__(self, name, description):
        self.name = name
        self.description = description

        BonsaiTree.num_trees += 1

    def describe_tree(self):
        ...

trees = []

tree = BonsaiTree("Winged Elm")
...

for tree in trees:
    tree.describe_tree()

BonsaiTree.count_trees()

We first add an attribute called num_trees. This attribute is added outside of the __init__() method. Notice that num_trees doesn’t have a prefix; it’s not called self.num_trees. An attribute with no prefix is associated with the overall class, and it points to one value. Attributes prefixed with self are associated with specific instances, and have distinct values associated with each instance.

To write a class method, use the @classmethod decorator. This decorator passes an argument representing the overall class to the method it decorates. By convention, class methods typically use cls as the first parameter.1 Inside the method, you can use cls.attribute_name to access the value of any class attribute such as num_trees. Here, we use cls.num_trees to compose a single sentence informing people how many trees have been added to the collection.2

To make use of this, we need to increment the value of num_trees whenever a new instance of BonsaiTree is created. We can do that in __init__():

    def __init__(self, name, description):
        ...
        BonsaiTree.num_trees += 1

The __init__() method doesn’t receive a cls argument, so it needs to access num_trees through the name of the class. Now, whenever __init__() is called as a new instance is being made, num_trees will be increased by 1.

Finally, we call this method outside the class using the class name:

BonsaiTree.count_trees()

The output shows how many instances have been created:

Winged Elm: tall, solid trunk
El Arbol Murcielago: short, open trunk
Mt. Mitchell: small mountain forest

We have 3 trees in the collection.
A single Bonsai exhibit consisting of multiple trees, including two standing dead trunks.
Some Bonsai “trees” are really collections of plants, meant to represent an ecosystem. This one is inspired by forests near the top of Mt. Mitchell.

A variety of approaches

The syntax used in the previous listing is the most appropriate for this small example. However, there are a variety of ways to work with class attributes and methods. This gives you a lot of flexibility in how you work with the information in a class, but it also creates some things to watch out for.

Using the name of the class in a class method

A class method automatically gets a reference to the class, but you can also use the name of the class to access class attributes. For example, this would work:

    @classmethod
    def count_trees(cls):
        msg = f"We have {BonsaiTree.num_trees} trees in the collection."
        print(msg)

There’s no reason to do this, but it’s good to be aware that this syntax won’t cause an error.

Calling a class method through an instance

You can access class methods through individual instances. For example, this code runs:

tree = BonsaiTree("Winged Elm")
tree.description = "tall, solid trunk"
tree.count_trees()

This will generate the same output as calling BonsaiTree.count_trees().

You might need to use this approach if the code you’re working on receives an instance of a class, but the module you’re working in doesn’t have direct access to the overall class. There’s no need to import the class in the module; you can just call the class method through the instance you’ve received.

This kind of flexibility is good, because when people are working with an instance, they shouldn’t have to think about how the class was written. They don’t need to keep track of which methods are instance methods, static methods, or class methods. They just need to know what they can do with objects, and leave the implementation details up to the library maintainers.

Accessing class attributes from within regular methods

You can access class attributes from within regular methods.3 For example we might expand the describe_tree() method to emphasize that this is one tree in a larger collection:

    def describe_tree(self):
        msg = f"{self.name}: {self.description}"
        msg += f"\n  This is one of {BonsaiTree.num_trees} trees."
        print(msg)

This is a regular instance method, receiving the self argument. It needs self, because it needs access to the current tree’s name and description. But we can also grab the value of num_trees in this method by using the class name syntax.

The output describes the individual tree, and reports how large the overall collection is as well:

Winged Elm: tall, solid trunk
  This is one of 3 trees.
...

This flexibility allows you to grab whatever information you need about an instance, or the overall class, when working inside a method.

Some things to watch out for

All of this flexibility is helpful for modeling complex real-world things, but it can make for some confusing behavior as well. For example, you can sometimes read class variables through self, but it’s usually not a good idea.

Reading class variables using self

In a regular instance method, we’ve seen that you can read the information from a class variable using the name of the class. However, in our example, this code also works:

    def describe_tree(self):
        msg = f"{self.name}: {self.description}"
        msg += f"\nThis is one of {self.num_trees} trees."
        print(msg)

This generates the same output as the previous example, even though num_trees is being accessed through self. This works because there’s only one attribute in the entire class called num_trees.

Writing data through self creates a distinct instance variable

Here’s where the confusion comes in: when you write to a variable using self, you create an instance variable if one doesn’t already exist. This version of the class does not do what we want:

class BonsaiTree:

    num_trees = 0

    @classmethod
    def count_trees(cls):
        msg = f"We have {cls.num_trees} trees in the collection."
        print(msg)

    def __init__(self, name, description):
        self.name = name
        self.description = description

        self.num_trees += 1

    def describe_tree(self):
        msg = f"{self.name}: {self.description}"
        msg += f"\n  This is one of {self.num_trees} trees."
        print(msg)

trees = []

tree = BonsaiTree("Winged Elm")
...

for tree in trees:
    tree.describe_tree()

BonsaiTree.count_trees()

Here we’re trying to use self.num_trees when incrementing the counter in __init__(). But writing to self.num_trees doesn’t modify the existing class variable num_trees. Instead, it makes a new instance attribute called num_trees, attached to self. Now every instance of BonsaiTree will have two versions of num_trees: one that’s associated with the overall class, and one that’s associated with itself.

That sounds confusing, and it is. Here’s the output:

Winged Elm: tall, solid trunk
  This is one of 1 trees.
El Arbol Murcielago: short, open trunk
  This is one of 1 trees.
Mt. Mitchell: small mountain forest
  This is one of 1 trees.

We have 0 trees in the collection.

Every instance’s version of num_trees is 1, and the overall class attribute is never modified from its initial value of 0.

Rather than trying to explain this in even more detail, there’s a simple takeaway that will help avoid all this confusion. When you’re working with a class variable inside a class, always access it through cls in a class method, or through the class name in regular methods. And try to avoid having class attributes and instance attributes with the same name whenever possible.4

Real-world example: pathlib.py

If you’ve worked with files in Python at all, you’re probably somewhat familiar with pathlib. (If you’re still using strings to represent paths, set aside some time to read about pathlib.Path objects; your work with files and paths will be much nicer.)

The pathlib library is implemented in a file called pathlib.py. That file contains four class methods. Here’s one of them:

    @classmethod
    def _parse_path(cls, path):
        if not path:
            return '', '', []
        sep = cls.pathmod.sep
        ...
        return drv, root, parsed

The leading underscore in _parse_path() indicates that it’s a helper method, used by other pathlib code to parse paths. One piece of information the method needs access to is sep, the path separator that’s been identified on the user’s OS. For example, the path separator is a forward slash on macOS and Linux, and a backslash on Windows.

When given a complete path, this method returns the individual parts of the path.

Conclusions

There’s a tremendous amount of flexibility in how you model things with code using OOP. That flexibility can be overwhelming at times, but it’s also what gives OOP the power that it has. If you use class methods where appropriate, your code will do what you need it to and the purpose of each method will be clear.

When the flexibility of OOP feels overwhelming, remember the following points:

  • If you’re working with data that’s associated with individual instances, write a regular method with self as the first argument. If you need to work with a class attribute, access it using the name of the class.
  • If you’re working with data that’s only associated with the overall class, write a class method and access the data through the cls argument.
  • If you’re writing a method that doesn’t need any data associated with individual instances or the overall class, write a static method (see MP #41).
  • When calling class methods, use the name of the class unless you only have access to an existing instance of the class.

And as always, focus on code that works for your current situation, without getting lost in an attempt to write “perfect” code. Pick an approach that works for your use case, start to write tests when things are working, and be ready to refine your architecture as you and your project evolve.

Resources

You can find the code files from this post in the mostly_python GitHub repository.


  1. Just as self could be called potato, there’s nothing special about the name cls. It’s used because class is already a keyword. You could use potato here as well, as long as you were consistent in your usage. But please don’t do that. :)

  2. To clarify, the code shown here counts how many instances have been created. It doesn’t track deletions, and it doesn’t actively count the number of instances in memory.

  3. By “regular methods”, we’re really talking about instance methods. These are standard methods that receive the self variable as the first argument, and act on instance attributes.

  4. If you want a little more clarification, this is a great example to run through Python Tutor. If you do so, you’ll see four references to num_trees in the final visualization. One belongs to the overall class, and there is one reference to num_trees for each of the three instances. All of these are distinct values, pointing to four different places in memory.