OOP in Python, part 2: What's so special about self?

MP 37: It's here, it's there, it's everywhere in a class.

Note: This is the second post in a series about OOP in Python. The first post discussed what OOP is, and why it’s important. The next post focuses on the __init__() method.


One of the things people ask about most often when discussing OOP in Python is self. People get a quick sense that self is something you attach your attributes to, and then they become available in all the methods in the class. But how does that happen? And why do you have to include self in all the method definitions?

In this post we’ll peek under the hood of a simple class and see exactly what self is, and how it works. Then we’ll take a quick look at a world where self doesn’t exist. After looking at a simple class, we’ll look at a real-world example and see why all this matters in much larger projects.

What is self?

I think people often skip over defining self in favor of focusing on what it does: it lets you access variables from anywhere in a class. As long as self.variable_name has been defined at some point, that value is available in any method in the class.1

But what is self? I’ve often heard it described as “a reference to the current instance of the class”. But what does that really mean? Let’s focus on a simple example in order to see this. We’ll use the same Robot class we saw in the last post, to keep the focus on OOP and not the context for now:

class Robot:
    """A class representing simple robots."""

    def __init__(self, name):
        self.name = name

    def say_hello(self):
        print(f"Hi, I'm {self.name}!")

my_robot = Robot("William")

When you run this code it doesn’t display any output. It does, however, create a single instance of the Robot class.2

Let’s see what self is, by simply printing it:

    def __init__(self, name):
        self.name = name
        print(self)

Now when we make a new Robot instance, we’ll see exactly what self is:

<__main__.Robot object at 0x1006b0d90>

In this output, __main__ refers to the overall program. This says that self is an object built from the Robot class in the main file that’s being run, and it’s stored at the memory address 0x1006b0d90.

This becomes more clear if we add a second print() call:

class Robot:
    """A class representing simple robots."""

    def __init__(self, name):
        self.name = name
        print(self)

    def say_hello(self):
        print(f"Hi, I'm {self.name}!")

my_robot = Robot("William")
print(my_robot)

We’re still printing self in __init__(), but we’re also printing the instance my_robot. The output is quite enlightening, if you haven’t run these kinds of diagnostics before:

<__main__.Robot object at 0x104488d90>
<__main__.Robot object at 0x104488d90>

The self in __init__() refers to exactly the same thing that my_robot does. This is what we mean when we say that “self is a reference to the current instance of the class”.

self is the same thing throughout a class

What about the self in say_hello()?

class Robot:
    """A class representing simple robots."""

    def __init__(self, name):
        self.name = name
        print(self)

    def say_hello(self):
        print(f"Hi, I'm {self.name}!")
        print(self)

my_robot = Robot("William")
print(my_robot)

my_robot.say_hello()

We add a call to print(self) in the say_hello() method. As you might expect, it’s the same Robot object, at the same place in memory:3

<__main__.Robot object at 0x101234d90>
<__main__.Robot object at 0x101234d90>
Hi, I'm William!
<__main__.Robot object at 0x101234d90>

Every occurrence of self in the class points to the same thing.

Multiple instances

If what you just read is new to you, you might want to sit with those thoughts for a little bit. At least, don’t be surprised if the next part is a little confusing.

self becomes more interesting when considering multiple instances. Here’s a brief listing that creates a small army of Robot instances:

class Robot:
    """A class representing simple robots."""

    def __init__(self, name):
        self.name = name
        print(self)

    def say_hello(self):
        print(f"Hi, I'm {self.name}!")

my_army = [Robot() for _ in range(3)]

And here’s the output:

<__main__.Robot object at 0x100970e50>
<__main__.Robot object at 0x100970f50>
<__main__.Robot object at 0x100970f90>

These are three different Robot objects, each stored at a different memory address. self can have different values, depending on which instance you’re focusing on.

That idea of focus is really helpful. It’s easy for us as programmers to focus on what the code looks like. We see a class, and we see the word self everywhere. It looks like it’s just one thing. But remember, a class is a blueprint for making objects. If you used one blueprint to make three different houses, the front door on the blueprint would represent three different front doors in the real world.

If we start to think of the code in a class as being executed for a specific instance, the idea of self having different values at different times starts to make more sense. It always refers to the current instance that we’re focusing on.

Why is self so important?

While it’s helpful to think of a class as a blueprint for making objects, it’s also helpful to think of it as a container for information and actions. It’s really helpful to have a way to access any information associated with an object from anywhere in the class. It’s just as helpful to have access to any action (method) associated with an object, from anywhere in the class.

To see this, let’s expand the demo class just a little, so there’s more information being passed around:

class Robot:
    """A class representing simple robots."""

    def __init__(self, name):
        self.name = name
        self.type = "drone"
        self.mass_grams = 249

    def say_hello(self):
        print(f"Hi, I'm {self.name}!")
        print(f"I'm a {self.type}.")
        print(f"I have a mass of {self.mass_grams}g.")

my_robot = Robot("William")
my_robot.say_hello()

This listing adds two more pieces of information, a type of robot and its mass in grams. It also displays that information when say_hello() is called. Just from this short example we can see why people ask about self. In this tiny class, self appears eight times! Here’s the output:

Hi, I'm William!
I'm a drone.
I have a mass of 249g.

As an experienced programmer, it’s easy for me to say that this is a fairly simple way of doing things. But that’s only because I’ve used OOP a lot, and I have a sense of what it might take to achieve similar functionality without the use of self. Let’s see what that world would look like.

A world without self

Without self, we have to pass all the information associated with an object around ourselves. In fact, the idea of objects is tied so closely to the idea of self, that we can’t really speak about objects in a world without self.

The best way I can think of to achieve something similar without classes is using a dictionary:

def make_robot(name):
    """Make a robot."""
    robot_dict = {
        "name": name,
        "type": "drone",
        "mass_grams": 249,
    }

    return robot_dict

def say_hello(robot_dict):
    """Make a robot say hello."""
    print(f"Hi, I'm {robot_dict['name']}.")
    print(f"I'm a {robot_dict['type']}.")
    print(f"I have a mass of {robot_dict['mass_grams']}g.")

my_robot = make_robot("William")
say_hello(my_robot)

This version of the program has a function called make_robot(), which returns a dictionary containing all the information relating to a robot. The function say_hello() accepts a robot dictionary, and prints the information it contains. We call these two functions.

This code produces the same output as the OOP code:

Hi, I'm William.
I'm a drone.
I have a mass of 249g.

In a simple case you might prefer to write code like this, perhaps because you understand dictionaries and functions better than you understand classes. But there are issues with this approach that become more problematic as the codebase grows. The dictionary-based syntax is less clean. We have to hop back and forth between focusing on the information, my_robot, and the action, say_hello():

my_robot = make_robot("William")
say_hello(my_robot)

In the OOP version, we’re always working from the perspective of the object:

my_robot = Robot("William")
my_robot.say_hello()

This syntax is more consistent. It makes it easier to reason about what the code is doing.

More importantly, in the OOP version of the code there’s a connection between the information and the associated actions. The variables that refer to the robot’s information are connected to the methods that define the robot’s behavior. In the non-OOP version, there’s no explicit connection between the information and the related actions. The function say_hello() works for a dictionary about a robot, but there’s nothing that ties it specifically to that dictionary. The function would work for any dictionary that has the keys name, type, and mass_grams. This becomes a maintenance issue over time, among other concerns.

Why does self always come first?

This is another question people often ask about self: Why does it always come first in each method’s list of arguments?

When you call a method, Python automatically inserts the self argument when it makes the actual call. Consider this line:

my_robot.say_hello()

Notice that the parentheses here are empty; this method call doesn’t seem to be passing any information to say_hello(). However, look back at the say_hello() method:

    def say_hello(self):
        print(f"Hi, I'm {self.name}!")
        print(f"I'm a {self.type}.")
        print(f"I have a mass of {self.mass_grams}g.")

The say_hello() method clearly needs some information to work with. Python automatically adds the self object as the first argument in any method call. In the end, this means each method needs to accept this argument.

What happens if you leave the self parameter out of the method definition?

    def say_hello():
        print(f"Hi, I'm {self.name}!")
        ...

If you do this, you’ll get an error about the missing argument:

Traceback (most recent call last):
  File "robot.py", line 11, in <module>
    my_robot.say_hello()
TypeError: Robot.say_hello() takes 0 positional arguments but 1 was given

This traceback is complaining that say_hello() doesn’t accept any arguments, but one argument is being provided. That argument is the self object, and it’s being provided automatically by Python.

Some languages don’t require method calls to explicitly accept the self object; it’s there if you need it, but you don’t have to include it in the method’s parameters. We’ll look at this a little more closely when we focus on other kinds of methods you can write in Python.

The name self is not important

Many people are surprised to learn that there’s nothing special about the name self. Check out this version of the Robot class:

class Robot:
    """A class representing simple robots."""

    def __init__(potato, name):
        potato.name = name

    def say_hello(potato):
        print(f"Hi, I'm {potato.name}!")

my_robot = Robot("William")
my_robot.say_hello()

In this version, all occurrences of self have been replaced by potato. This code runs perfectly well. Python doesn’t pass the word self around; it passes objects around. We just happen to call that object self by convention. Many other languages also use self, and some use this.

You can see what’s in self

The last thing I’ll share about self is that you can easily see what’s in it. Every instance of self has a dictionary associated with it. Let’s print what’s in that dictionary:

class Robot:
    """A class representing simple robots."""

    def __init__(self, name):
        self.name = name
        self.type = "drone"
        self.mass_grams = 249
        print(self.__dict__)

    def say_hello(self):
        ...

my_robot = Robot("William")

Python stores the values associated with self in a special dictionary named __dict__. Here’s what’s in that dictionary:

{'name': 'William', 'type': 'drone', 'mass_grams': 249}

These are the names and values of all the attributes that have been defined for the object. When you write code like self.name or my_robot.type, Python looks up the corresponding value in this dictionary.

Since self and my_robot both point to the same object in memory, you’ll see the same dictionary if you add one more line to this program:

my_robot = Robot("William")
print(my_robot.__dict__)

Dictionaries are used all over the place in Python internals. We’ll be seeing more of them as we continue to dig into Python’s OOP implementation.

A real-world self.__dict__

Let’s step away from oversimplified robots, and look at a real-world example. Here’s a short program that uses Matplotlib to generate a scatter plot of the first 100 square numbers:

import matplotlib.pyplot as plt

x_vals = list(range(100))
squares = [x**2 for x in x_vals]

plt.style.use('classic')
fig, ax = plt.subplots()
ax.scatter(x_vals, squares)

plt.show()

Here’s the plot that this example generates:

Scatter plot of the first 100 square numbers.
Even in a very simple plot, there’s a lot of OOP work going on behind the scenes.

The documentation for the subplots() function states that it returns two items, an instance of the Figure class and an instance of the Axes class. We are capturing these two items in this line:

fig, ax = plt.subplots()

Let’s print out the self.__dict__ dictionary associated with the ax instance. We could go into our copy of Matplotlib’s source code, but since self in the Axes class is the same thing as ax, we can just print the value of ax.__dict__. Here’s the code to do that:

...
plt.show()

print(type(ax))
print(ax.__dict__)

We’ll print the type of ax to verify it’s an instance of the Axes class, and then print the __dict__ associated with the ax instance. Here’s the output:

<class 'matplotlib.axes._axes.Axes'>
{
    '_stale': True,
    'stale_callback': < function _stale_figure_callback at 0x11ca6f240 > ,
    '_axes': < Axes: > ,
    'figure': < Figure size 640 x480 with 1 Axes > ,
    ...
}

This is quite a long dictionary; the full listing shows 81 key-value pairs. This is the real value of OOP structure—if we didn’t have the self mechanism to work with, we’d have to manage this dictionary ourselves.

You can learn a lot about how a library works internally by looking at things like the __dict__ associated with various objects. For example I was not aware that the ax instance has an internal reference to the overall Figure object that it’s associated with. Also, looking through the keys and values in the dictionary shows a lot about how Matplotlib keeps track of all the information needed to specify a plot.4

Conclusions

It’s accurate to say that self provides a way of accessing information that’s needed throughout a class, without passing a bunch of different values around in method definitions and arguments. But leaving it at that makes self seem like a bit of magic. You can also think of it as a dictionary of values that’s automatically passed to each method in a class. You could do that yourself, but it would be quite messy and extremely repetitive. So in the end, self is a nice bit of magic that Python provides for us.

In the next post, we’ll take a closer look at another bit of OOP magic: the __init__() method.

Resources

You can find the code files from this post in the mostly_python GitHub repository.


  1. There are some special kinds of methods that don’t include self. We’ll get to those methods in a later post.

  2. Note that I am using object and instance interchangeably.

  3. You’ll see different memory addresses each time you run the program. For any one run, you’ll see the same address for all occurrences of self and the instance created from the class.

  4. The first line of this listing is:

    <class 'matplotlib.axes._axes.Axes'>

    This is the output of print(type(ax)). It shows that ax is an instance of the Axes class, but it also shows us where we can find the source code for that class.

    If Matplotlib is installed in a virtual environment, the path to third-party packages you’ve installed will be something like .venv/lib/python3.11/site-packages. In that site-packages directory, look for the name of the package you installed, in this case matplotlib. Then follow the path we see in the output: matplotlib.axes._axes.Axes. This means we should find the source code for the Axes class in a file called _axes.py, in a folder called axes, in the matplotlib folder. On my system, the path to this file is

    .venv/lib/python3.11/site-packages/matplotlib/axes/_axes.py

    The Matplotlib source code is hosted on GitHub. The overall repository has some resources that are not included in what gets installed to every user’s system, so you have to dig a little to find the source code. The matplotlib folder is in lib. From there, you follow the same path that we see in the output and in the virtual environment. Look for a folder called axes, and a file called _axes.py. Here’s that file, and here’s the start of the source code for the Axes class. If you haven’t looked at library code before, it’s pretty interesting to see. The Axes class is over 8000 lines long!