Python Lists: A closer look, part 11

MP #21: Copying lists, and some final takeaways

Note: The previous post in this series explained why to avoid using an empty list as a default argument, and how to remove all occurrences of an element from a list. This is the last post in the series.

In this final installment of the series focusing on lists, we’ll look at one last interesting behavior. What happens, exactly, when you copy a list? We’ll look at what happens when you make a copy of a list, vs making a deep copy of a list. We’ll also consider some final takeaways from the series as a whole.

Copying lists

What happens when you copy a list? In English, the word copy implies that you get a separate copy of the thing you started with. So if I give you a copy of my resume and you mark it up, my original should not be affected by the changes you made.

Copying string values

Copying simple values like strings works like this in Python. Let’s say I always greet people with a hearty “Hello!” You have the same habit, so you copy my greeting:

>>> my_greeting = "Hello!"
>>> your_greeting = my_greeting

But at some point I become more terse and start shortening my greetings to “Hi!”

>>> my_greeting = "Hi!"

Clearly the value of my_greeting has changed. But what’t the value of your_greeting? Is it still “Hello!”, or is it tied to the current value of my_greeting?

Let’s find out:

>>> your_greeting
Hello!

Strings are immutable values; they don’t change. When you assign one string variable to another, the value is copied over, but there’s no ongoing relationship between the variables. They each point to their own string object.

Copying numerical values

Numerical values work this way as well. Let’s say we have the same number of cats and dogs, but then we get another dog:

>>> num_dogs = 3
>>> num_cats = num_dogs
>>> num_dogs += 1

How many dogs are there now, and how many cats?

>>> num_dogs
4
>>> num_cats
3 

Like strings, variables that refer to numerical values have their own “copies” of those values. Once they’re defined, they have no ongoing connection.

Copying lists

Let’s try the same thing with a list. Imagine I’m familiar with a number of programming languages, and you know those same languages. So we’ll just assign my languages to you:

>>> my_languages = ['Python', 'Java', 'C']
>>> your_languages = my_languages

Now I go off and learn Rust:

>>> my_languages.append('Rust')

What happens to your languages?

>>> your_languages
['Python', 'Java', 'C', 'Rust']

This is similar to the issues we saw in part 8 and part 10 of this series. The problem is that both the variables my_languages and your_languages refer to the same list object in memory. If you want each variable to refer to its own list, you have to explicitly make a copy of the list:

>>> my_languages = ['Python', 'Java', 'C']
>>> your_languages = my_languages[:]
>>> my_languages.append('Rust')
>>> your_languages
['Python', 'Java', 'C']

You can also use the list.copy() method:

>>> my_languages = ['Python', 'Java', 'C']
>>> your_languages = my_languages.copy()
>>> my_languages.append('Rust')
>>> your_languages
['Python', 'Java', 'C']

When copying a list doesn’t work

When you’re trying to copy a list, the .copy() method works as you’d expect if all the items are immutable values, such as strings or numbers. But if any of the items are mutable, then .copy() can have a surprising behavior. One of the clearest ways to see this is to make a list of objects from a custom class, and then copy that list.

Here’s a simple class called Location, where objects have two attributes: an x and a y value. It might be used, for example, to keep track of the locations of characters in a game:

# locations.py

class Location:

    def __init__(self, x=0, y=0):
        self.x = x
        self.y = y

    def __repr__(self):
        return f"({self.x}, {self.y})"

If you haven’t used __repr__() before, it’s a method that’s called when Python needs to display an object. If we print a location, or a sequence of locations, we’ll strings that look like this.

Let’s make a few locations, and store them in a list. We’ll also make a copy of these initial locations, so we can keep track of the original locations:

class Location:
    ...

location_1 = Location(0, 10)
location_2 = Location(0, 20)
location_3 = Location(0, 30)

locations = [location_1, location_2, location_3]
original_locations = locations.copy()

print(f"current locations:\t{locations}")
print(f"original locations:\t{original_locations}")

The backup list seems to be working so far:

current locations:  [(0, 10), (0, 20), (0, 30)]
original locations: [(0, 10), (0, 20), (0, 30)]

Now let’s update the locations, and see what happens:

...
# Shift all locations to the right.
location_1.x += 10
location_2.x += 10
location_3.x += 10

print(f"current locations:\t{locations}")
print(f"original locations:\t{original_locations}")

We add 10 to every location’s x value. Here’s the new output:

current locations:  [(0, 10), (0, 20), (0, 30)]
original locations: [(0, 10), (0, 20), (0, 30)]

current locations:  [(10, 10), (10, 20), (10, 30)]
original locations: [(10, 10), (10, 20), (10, 30)]

The original locations are changing along with the current locations!

This happens because even though we created a copy of the list, we didn’t make a copy of each of the items in the list. There are two separate lists, but the items in each list point to the same objects:

When you copy a list with list_name[:] or list_name.copy(), complex objects inside the list are not copied. Instead, the new list gets a reference to the same objects that the original list refers to.

When any one of the individual location objects changes, the corresponding item in the list “changes” as well, because that item is really just a reference back to the individual location object.

Using deepcopy()

In order to make a list where every item is a copy of the original, we need to use the deepcopy() function from the copy module. Rather than just copying the references to any objects inside a list, the deepcopy() function copies all of the objects found in a list. The resulting list is completely separate from the original list:

When you copy a list using deepcopy(), the new list gets its own copies of all objects in the original list.

This is a small change to the code we wrote earlier:

# locations_deepcopy.py

from copy import deepcopy

class Location:
    ...

location_1 = Location(0, 10)
location_2 = Location(0, 20)
location_3 = Location(0, 30)

locations = [location_1, location_2, location_3]
original_locations = deepcopy(locations)

print(f"current locations:\t{locations}")
print(f"original locations:\t{original_locations}")

# Shift all locations to the right.
location_1.x += 10
...

We import deepcopy(), and use it to generate original_locations. Now when the individual location objects are changed, the items in original_locations are not affected:

current locations:  [(0, 10), (0, 20), (0, 30)]
original locations: [(0, 10), (0, 20), (0, 30)]

current locations:  [(10, 10), (10, 20), (10, 30)]
original locations: [(0, 10), (0, 20), (0, 30)]

Conclusions

When the items in a list are simple objects like strings or numbers, you can make a copy using slice notation or the .copy() method:

new_list = list[:]
new_list = list.copy()

If the items in your list include more complex objects such as instances of a class, and you want to keep the lists entirely separate, use deepcopy():

from copy import deepcopy
new_list = deepcopy(list)

You should also consider using deepcopy() when you’re working with a list that has nested sequences inside it, such as a list of lists or a list of dictionaries.

Final takeaways

This is the last post in the series, so let’s consider some final takeaways from the series as a whole:

  • Python lists are simple and elegant. If using them in simple ways does what you need, keep your code simple until you have a specific reason to change your approach.
  • Keep in mind the underlying memory model of a list: it always reserves a little more space at the end than you’re currently using. When a program involving lists starts to slow down, understanding these underlying principles can help you identify a more efficient approach.
  • If you haven’t already done so, make time to become comfortable with comprehensions. They’re concise and efficient, and they form the foundation of working with other kinds of sequences as well.
  • Remember that tuples can be used just about anywhere a list can, if that list won’t need to be modified. If you know you’re not going to modify a list, try using a tuple instead. Tuples are also used quite often in Python library code, because they’re efficient for passing small collections of values between parts of a program.
  • When your programs slow down and you’re forced to do some optimization work, make sure you profile your project first. Intuition is important in optimization, but nothing takes the place of concretely identifying which parts of a program are the true bottlenecks.
  • If you’re working with long sequences, or doing frequent enough operations on your sequences that the program is slowing down, consider using NumPy arrays. With the right kind of data and processing approaches, using arrays can drastically speed up your program’s execution. Keep in mind that working with usually introduces a new level of complexity, so avoid using arrays when they’re not necessary.
  • Remember that sets are collections where every item must be unique. If all the items in a collection are unique and the order doesn’t matter, consider using a set. If you need to get a copy of one of every unique item in a list, consider using set(list_name).
  • At some point you’ll likely need to pass a list to a function. When you do so, keep in mind that any changes the function makes to the list will persist after the function finishes its work. If you don’t want this to happen, consider passing the function a copy of the list instead.
  • If you need to modify a list, don’t use a simple for loop. When Python sets up a for loop, it counts on the list not changing throughout the duration of the loop. If you need to modify a list the most common approach is to use a comprehension, which generates the list you want based on the list you have.
  • When you’re writing a function, don’t use an empty list as a default argument. Instead, use None and create a new list in the __init__() method when necessary.

I hope you’ve learned something from this series. If you have any questions or feedback, please feel free to leave a comment here, or on any of the posts in the series.

Resources

You can find the code files from this post in the mostly_python GitHub repository. Also, you may want to look at the documentation for deepcopy().