Python Lists: A closer look

MP #2: Why lists in Python are so clean

Note: This post is the first in a series about lists, and how they can help people develop a deeper understanding of Python as a whole.

When non-programmers ask what programming is really like, I always focus the conversation around lists. Lists let you quickly discuss variables, working with arbitrary amounts and kinds of data, loops, and many other fundamental concepts as well. A working understanding of lists also serves as a bridge to learning about more complex data structures and operations, and underlying concepts of how programming languages work. In this first post we’ll look at some basic aspects of Python lists, and how they compare to similar structures in other languages.

What is a list?

Most readers have probably worked with lists to some degree, but it’s still worth answering this question. A list is a sequence of values. It’s sometimes referred to more generally as a collection, but sequence is a better term because a list is an ordered collection.

How we name something is important because an accurate name tells us what we can do with that thing. What would you expect to be able to do with a sequence? You’d probably expect to be able to create a sequence, grab items from it, add more items to it, remove some items, change the order of the items, and more. In Python, you can do all of these operations with most sequences, including lists.

Thinking about lists

Good metaphors are important in understanding abstract things like programming languages. An accurate metaphor helps you learn a new concept. It helps you make predictions about behaviors you haven’t specifically learned yet, and it helps you build a deeper understanding of abstract concepts.

I like to think of a list as a shelf. A shelf can hold items, it has left and right ends, and you can place items on a shelf in a variety of ways. You can stand books side by side on a shelf, but you can also put stacks of books anywhere on a shelf. You can use bookends, and you can display other items, such as LEGO models, on a shelf alongside your books.

Bookshelf with a number of programming books, a box of flash cards, and a LEGO car on it.
Just a small section of my technical book collection.

For this first discussion, we’ll consider a single bookshelf with books standing side by side. Here’s an actual Python list, representing this bookshelf:

books = [
    "Python Crash Course",
    "Serious Python",
    "Fluent Python",
    "Mastering Regular Expressions",
    "Fundamentals of Data Visualization",
]

A list, as you may know, is indicated by square brackets with individual items separated by commas. This post doesn’t aim to teach people everything about lists; the focus is on understanding lists at a deeper level. So let’s just loop over the list and print each book’s title:

for book in books:
    print(book)

Output:

Python Crash Course
Serious Python
Fluent Python
Mastering Regular Expressions
Fundamentals Of Data Visualization

Why lists are so awesome

Python lists are one of my all-time favorite programming structures. In just a few relatively simple lines of code we have a collection of items that we can easily work with. If this “bookshelf” actually represented a warehouse belonging to a large bookseller, the list itself could have millions of items. But the code for looping over the list would still be just two lines long. That concept felt like magic to me the first time I wrote a loop over a list, and it still feels magical when I stop to think about it.

Also, the phrase for book in books is really powerful. In pseudocode, we might read this as “for every book in the collection of books”. It’s really clear when we’re working with a single book, and when we’re working with the entire collection of books.

Python doesn’t care about these names; you could just as well write for x in books, and as long as you use x inside the loop when you want to refer to a single book, your code would run. We don’t name things carefully just to avoid syntax errors, though. Naming things thoughtfully helps us reason about our code more effectively. I don’t often see people write for x in books, but I’ve seen many people come up with loops like for books in book_collection. Then they end up confused when something called books only refers to a single book.

What loops look like in C (and similar languages)

If you’ve only ever used Python, you may not fully appreciate the simplicity of Python’s lists. Here’s one way to build the same collection of books in C:

#include <stdio.h>

int main(void) {

    // Declare an array of 5 pointers.
    char *books[5] = {
        "Python Crash Course",
        "Serious Python",
        "Fluent Python",
        "Mastering Regular Expressions",
        "Fundamentals of Data Visualization"
    };
    
    // Loop over the array, and print each element.
    for (int i=0; i<5; i++) {
        printf("%s\n", books[i]);
    }

    return 0;
}

If you haven’t worked with C before, I’ll briefly explain this code. The first line includes stdio.h, a header file that contains code for dealing with input and output. A simple C file needs to be structured as a function called main, which in this simple example doesn’t take any input (void). The function returns an integer.

Indentation helps make this code more readable, but indentation isn’t required in C. Instead curly braces mark the beginning and end of code blocks. Inside the main function, we declare an array of five pointers; a pointer is a reference to a location in memory. Here we set each pointer to reference a sequence of characters, representing the title of each book.

To make a loop, we have to declare a counter variable, which keeps track of where we are in the array. The counter for this loop starts at zero, and keeps being incremented (i++) as long as the counter is less than five. The body of the loop prints each book’s title, followed by a new line. Finally, the main function returns 0, indicating that no errors were encountered.

We’ll consider the structure of arrays in C more in later posts (with no more C code, though), in order to better understand what Python is doing for us when it loops over a list.

Note: I haven’t written C code in a long time, so there may be some slight inaccuracies in this description. The larger points are valid, and will serve later discussions about Python well.

Why Python loops look so much cleaner

Python was originally developed at a time when most languages required you to use structures similar to what you just saw if you wanted to work with a collection of items. For experienced programmers, writing loops was not particularly difficult. It did get tedious, however, and there were many ways to inadvertently introduce bugs. Some of these bugs would cause obvious errors that you’d have to fix right away. But others were subtle and would only show up in specific circumstances, which were often hard to debug. For beginners, and for people who wanted to focus more on solving real-world problems that weren’t centered on programming, structures like these kept many people out of the programming world.

When Python was first developed, one of the key ideas was to automate things like looping over a collection. When you write a simple for loop over a list in Python, it automatically generates code like what you saw in the previous section. This saves you time, and lets you focus on the problems you’re trying to solve instead of the specifics of counters and how collections are implemented in lower-level languages. You also end up with much cleaner code that’s easier to read and maintain over time.

What’s next?

In the next post we’ll look at what happens when lists break down. What happens when a list is so large that your program starts to slow down noticeably? What happens when you modify a list so frequently that your program slows down? There are almost always relatively straightforward ways to address these situations, if you have a clear understanding of what Python does for you when managing lists. We’ll build an understanding of why lists sometimes break down, and what you can do when it happens.

Resources

You can find the code files from this post in the mostly_python GitHub repository.