Reading code

MP 38: An important but seldom-discussed skill.

Reading code is an important skill, but one that’s not discussed very often. I think there’s an assumption that if you learn to write code reasonably well, you’ll automatically learn to read code efficiently as well. But reading code involves different skills than writing code, so I don’t think that’s a safe assumption.

A reader recently wrote to ask about one of the solutions from the exercises in Python Crash Course. They said their solution was much simpler than the one I posted, and said they had a hard time reading through the posted solution. I think this is a perfect example of how strategies for reading through code efficiently are not obvious at all. In this post I’ll show the program we were discussing, and a couple strategies for how to make sense of code more efficiently than just reading a file from beginning to end.

The exercise

The exercise we were discussing is broken into two parts. Here’s the first part:

Lottery

Make a list or tuple containing a series of 10 numbers and 5 letters. Randomly select 4 numbers or letters from the list and print a message saying that any ticket matching these 4 numbers or letters wins a prize.

This exercise is intended to give people the chance to practice using code from the Python standard library, such as the random.choice() function.

Here’s the followup exercise:

Lottery Analysis

You can use a loop to see how hard it might be to win the kind of lottery you just modeled. Make a list or tuple called my_ticket. Write a loop that keeps pulling numbers until your ticket wins. Print a message reporting how many times the loop had to run to give you a winning ticket.

This is an interesting exercise to pose at a point where people have started to learn about sequences, loops, functions, and classes. There are many ways to structure a solution to this exercise, and it’s good to apply what you’ve been learning to a specific but open-ended task like this.

The posted solution

The posted solution is a little long if you’re not used to reading code, but here it is in its entirety:

from random import choice

def get_winning_ticket(possibilities):
    """Return a winning ticket from a set of possibilities."""
    winning_ticket = []

    # We don't want to repeat winning numbers or letters, so we'll use a
    #   while loop.
    while len(winning_ticket) < 4:
        pulled_item = choice(possibilities)

        # Only add the pulled item to the winning ticket if it hasn't
        #   already been pulled.
        if pulled_item not in winning_ticket:
            winning_ticket.append(pulled_item)

    return winning_ticket

def check_ticket(played_ticket, winning_ticket):
    # Check all elements in the played ticket. If any are not in the 
    #   winning ticket, return False.
    for element in played_ticket:
        if element not in winning_ticket:
            return False

    # We must have a winning ticket!
    return True

def make_random_ticket(possibilities):
    """Return a random ticket from a set of possibilities."""
    ticket = []
    # We don't want to repeat numbers or letters, so we'll use a while loop.
    while len(ticket) < 4:
        pulled_item = choice(possibilities)

        # Only add the pulled item to the ticket if it hasn't already
        #   been pulled.
        if pulled_item not in ticket:
            ticket.append(pulled_item)

    return ticket

possibilities = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'a', 'b', 'c', 'd', 'e']
winning_ticket = get_winning_ticket(possibilities)

plays = 0
won = False

# Let's set a max number of tries, in case this takes forever!
max_tries = 1_000_000

while not won:
    new_ticket = make_random_ticket(possibilities)
    won = check_ticket(new_ticket, winning_ticket)
    plays += 1
    if plays >= max_tries:
        break

if won:
    print("We have a winning ticket!")
    print(f"Your ticket: {new_ticket}")
    print(f"Winning ticket: {winning_ticket}")
    print(f"It only took {plays} tries to win!")
else:
    print(f"Tried {plays} times, without pulling a winner. :(")
    print(f"Your ticket: {new_ticket}")
    print(f"Winning ticket: {winning_ticket}")

For people who are just learning to program, this looks like a lot of code. It’s 68 lines if you include the blank lines and comments.

I think most people, without a lot of experience or explicit teaching, tend to read through a file like this from top to bottom while trying to make sense of everything. There’s a much better approach.

Reading strategy: Ignore function definitions

One of the best strategies when reading through a new codebase is to ignore the function definitions. Look how much smaller this file is without any of the function definitions:

...
possibilities = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'a', 'b', 'c', 'd', 'e']
winning_ticket = get_winning_ticket(possibilities)

plays = 0
won = False

# Let's set a max number of tries, in case this takes forever!
max_tries = 1_000_000

while not won:
    new_ticket = make_random_ticket(possibilities)
    won = check_ticket(new_ticket, winning_ticket)
    plays += 1
    if plays >= max_tries:
        break

if won:
    print("We have a winning ticket!")
    print(f"Your ticket: {new_ticket}")
    print(f"Winning ticket: {winning_ticket}")
    print(f"It only took {plays} tries to win!")
else:
    print(f"Tried {plays} times, without pulling a winner. :(")
    print(f"Your ticket: {new_ticket}")
    print(f"Winning ticket: {winning_ticket}") 

We could start reading this code more closely now, but there’s another strategy that reduces the clutter a little further.

Reading strategy: Simplify repetitive blocks

Before reading from top to bottom, there’s still some benefit to skimming the codebase and seeing if there’s anything you can simplify, either mentally or explicitly. If you see some repetitive blocks, you can often get a quick sense of what those blocks are doing, and ignore the details of those blocks.

There are two blocks that consist of nothing more than a bunch of print() calls at the bottom of this file. When I’m reading through code like this, I often ignore those calls mentally. So, when I’m reading through a file like this, here’s what I actually see:

possibilities = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 'a', 'b', 'c', 'd', 'e']
winning_ticket = get_winning_ticket(possibilities)

plays = 0
won = False

# Let's set a max number of tries, in case this takes forever!
max_tries = 1_000_000

while not won:
    new_ticket = make_random_ticket(possibilities)
    won = check_ticket(new_ticket, winning_ticket)
    plays += 1
    if plays >= max_tries:
        break

if won:
    # Print message about winning.
else:
    # Print message about not winning.

This is a much smaller body of code to make sense of than the original file.

Making sense of the code

Now we can finally focus on actually reading code. Knowing that this program is about finding winning lottery tickets, here’s how I would read the code that remains:

  • There’s a list called possibilities. These must be the numbers and letters that can appear on a lottery ticket.

  • The variable winning_ticket seems to refer to a winning set of numbers and letters, pulled from possibilities. I’m not going to read the body of get_winning_ticket() at this point. For now I’ll assume the function is written correctly, and it returns a random set of choices from the possible letters and numbers.

  • We see two new variables defined, plays and won. plays is set to 0, so maybe that’s the number of times the lottery was played? won is a Boolean, set to False initially. I would guess that nobody has won the lottery yet.

  • A comment explains that max_tries is an upper bound to how many times we’ll try to generate a winning ticket.

  • The while loop seems like the most important part of the code that we’re reading. Here’s what it seems to do:

    • The condition while not won implies we’ll keep looping until a winning ticket is pulled.

    • As long as a winning ticket hasn’t been pulled, we’ll keep making new_tickets.

    • Every time we make a new ticket, we’ll call check_ticket() to see if it’s a winning ticket.

    • We increment the number of plays on each pass through the loop.

    • We’ll end the loop without a winning ticket if we exceed max_tries.

  • After the loop finishes, we’ll display an appropriate message about the results.

That’s the core of this entire program. The rest of the code is implementation details, which we can choose to read through if we want to know more about how it works.

Using an IDE to read code

Once you’ve practiced this enough, you can skip around code files and only “see” the most important parts of the codebase. This is how many people read code efficiently. When people are reading code much quicker than you think possible, it’s because they’re not reading all the code in the file. Rather, they’ve learned to quickly spot which sections of code to focus on.

Your IDE or editor can help you learn these kinds of skills. For example here’s what this file looks like in Sublime Text, with the function definitions and the print() blocks hidden:

Sublime Text window, with functions bodies collapsed. Only about 25 lines of code and comments are visible.

As you’re examining a file, collapsing parts you either understand or recognize as less significant can leave you with only the code you really need to focus on reading.1

Writing readable code

Knowing how people tend to read code can help you write better code. For example, breaking a program into multiple files in a meaningful way makes it easier for you and others to make sense of your own code.

In this example, moving the functions to a separate file would make the main program file look like the simpler version we focused on in this post. This also separates the overall program logic from the implementation details of specific tasks, such as checking whether an individual ticket is a winner.

This is also a reminder that naming things well really is an important part of programming. If people might not read your function bodies, your function names should clearly communicate what they do. You also want your variable names to be short but descriptive, and accurate.2

Conclusions

Reading code is an important skill for programmers of all experience levels. Knowing how to read code efficiently helps you review code related to the current problem you’re working on. This includes solutions to exercises when you’re first learning, but it goes well beyond initial learning. If you can read code efficiently you’ll have a better sense of how other people write code, and how they approach a variety of problems. With the growing importance of AI tools, reading through code efficiently is an increasingly critical skill. You can’t just blindly accept the code that AI tools suggest, so reading the output of these tools quickly and accurately is a tremendously helpful skill.

You’ll also be able to review your own code more efficiently, especially when you come back to code you wrote a while back. You’ll be able to participate more effectively in code reviews, and you can feel more comfortable jumping into new codebases at work or in your own projects. You can learn a great deal by reading through the code from third-party libraries and frameworks you use, and there’s way too much code in these projects to read through all of it. Learning what to focus on makes these codebases much more approachable.

The skills and strategies involved in reading code are not often taught explicitly. People tend to learn these skills through experience, or in conversation with other programmers. I hope this post has helped you see that others aren’t just reading code faster than you, they’re using strategies that you can practice and learn to use efficiently as well.

Note: A later post will discuss strategies for reading through larger codebases, such as the source for significant third-party libraries and frameworks. If there’s an open source codebase you’d like to see covered, please reply to this email or reach out and let me know which ones you’re interested in.


  1. There’s an automated way to do this as well. With the file lottery_analysis.py open in Sublime Text, go to Edit > Code Folding > Fold All. This will collapse all the blocks, including classes, functions, loops, and conditional blocks. If you hove over the line number column, you’ll see little triangles everywhere some code has been collapsed.

    The program file will appear much simpler. Now you can read through the file from top to bottom, only unfolding the blocks that you want to focus on. In this example I would leave the functions collapsed, and unfold the while not won block, and then the final blocks with the print() calls as well.

    Most editors and IDEs provide a way to do this, and it can make your experience of reading code feel much different.

  2. Sometimes this is really simple. I’ve seen many people fail to recognize a mismatch between a singular name and a plural object, and I’ve seen AI tools make this same mistake. For example I recently saw a variable named cli_option, which actually contained four options. That name was used because the class was called CLIOption. A much better name for the class is CLIOptions, and the variable should be called cli_options. This small change communicates clearly that the object represents multiple command line options.

    This is an easy mistake to make when writing exploratory code, but it’s something that’s absolutely worth refactoring as a project matures. It will help you continue to make sense of your own code as you work with it over a longer timeframe, and it will help others make sense of your code when they first read through it as well.