Birthdays and lotteries

MP 120: I rarely play the lottery, but when I do I like to model it afterwards.

I've written recently about surviving a pretty serious medical event and then making it through a hurricane with some life interruptions, but no significant harm. When I was in the hospital, I told myself that I'd been lucky in enough ways that I should buy a lottery ticket when I got out.

I don't buy lottery tickets very often, so when I do I like to go all in. I bought a ticket for the North Carolina Powerball Lottery, which currently has a top prize of around $400m. That's life changing money! I've bought about three lottery tickets in my life, so I don't have any established approach to picking numbers. My family got me through these recent hard times, so I chose numbers that came from our birthdays. Later, I started wondering if limiting my choices to the values 1-31 would affect our chances of winning, since you can pick numbers from 1 through 69.

In this post I'll walk through a quick, simple check of that question. It feels like only choosing numbers between 1 and 31 should roughly cut our chances of winning in half. But when I think more rationally it seems like any set of numbers should have the same chance of winning as any other.

Lotteries make me feel uncomfortable

I've only gambled a handful of times in my life, and I really dislike state-sponsored gambling. I've stood in convenience store checkout lines many times, watching people mindlessly scratching off small-stakes lotto tickets. It's pretty sad to see people make a small pile of "winning" tickets, only to trade that pile in for a smaller batch of new tickets. It's pretty clearly addictive, yet we still fund so many programs with lottery proceeds.

Snippet of NC Lottery website titled "No longer just a game?", with phone text and chat links to talk about gambling addiction.
Juxtaposed against a shiny website and messaging that encourages you to play, there are numerous links that suggest playing the lottery can be addictive. I'd be happy to see all state-sponsored gambling disappear from society.

When I decided to buy a ticket, I had to read a bunch before I could figure out how to play. It's interesting to see all the warnings about how this can become an addiction. This post is not an endorsement of gambling; if everyone went through the process of modeling lotteries I think they'd be a bit less popular.

Modeling the lottery

The NC Powerball lottery has a number of ways you can win. A lot of it is tied to the "Powerball", a specific ball that increases most of the prizes if you guess it correctly. To keep this simple, I'll ignore the Powerball and just focus on the core aspect of the lottery: trying to match 5 numbers between 1 and 69.

Let's write a function to get a ticket:

from random import randint

def get_ticket(ticket_size=5, max_value=69):
    """Get a ticket of ticket_size numbers.
    Numbers are picked from 1 through max_value).
    """
    ticket = []

    while len(ticket) < ticket_size:
        pulled_number = randint(1, max_value)

        # Don't reuse numbers.
        if pulled_number in ticket:
            continue

        ticket.append(pulled_number)

    ticket.sort()
    return ticket

breakpoint()
nc_powerball.py

The randint() function returns a random integer between the two values provided. It's important to note that this function can return the starting and ending numbers, as well as every number in between.

The function get_ticket() takes two arguments. The ticket_size argument specifies how many numbers should be generated for a specific ticket. The max_value argument specifies the largest number you can choose for the given lottery. The default values here match the rules for the NC Powerball lottery. Lottery tickets typically show numbers in ascending order, rather than the order that numbers are pulled, so the list representing the ticket is sorted before being returned.

When doing exploratory work like this, I like to put a breakpoint at the end of the file, and play around with the code before moving on. Let's run this program, and make a few tickets in the debugging session:

$ python nc_powerball.py 
-> breakpoint()
(Pdb) quick_pick = get_ticket()
(Pdb) quick_pick
[19, 20, 21, 55, 69]
(Pdb) birthday_ticket = get_ticket(5, 31)
(Pdb) birthday_ticket
[7, 9, 14, 23, 24]
(Pdb) winning_draw = get_ticket()
(Pdb) winning_draw
[9, 13, 20, 31, 40]

Here I'm using get_ticket() in three ways. First, the lottery lets you choose a "Quick Pick" ticket, where the machine generates a random ticket for you. This can be simulated by calling get_ticket() with no arguments.

For birthday_ticket I'm limiting the numbers to 1-31, which generates a ticket someone might play if they use people's birthdays to pick numbers. To get a set of winning numbers, assigned to winning_draw, the default arguments again suffice.

Checking tickets

We need to be able to check if a ticket is a winner. Here's a function that accepts a ticket, and compares it against the winning numbers that were drawn:

def check_ticket(ticket, winning_draw):
    """Find out how many numbers matched."""
    matches = 0
    for num in ticket:
        if num in winning_draw:
            matches += 1

    return matches

I like to start with the simplest solution to a problem, especially when writing about code. The function check_ticket() first sets the number of matches to zero. It then looks at each num in the ticket that was played, and looks to see if that number is in the winning draw. If it is, it increments the number of matches for this ticket. After it runs through all the numbers in the ticket, it returns the number of matches.

If you're new to Python, that's a perfectly reasonable way of checking a ticket. However, we can write a much shorter function using a comprehension:

def check_ticket(ticket, winning_draw):
    """Find out how many numbers matched."""
    matches = [num for num in ticket if num in winning_draw]
    return len(matches)

This version of check_ticket() generates a list called matches. That list is built by looping over the numbers in the played ticket (num for num in ticket), but only keeping the values that are in the winning draw (if num in winning_draw). We then return the length of matches.

We can even make this a one-line function:

def check_ticket(ticket, winning_draw):
    """Find out how many numbers matched."""
    return len(
        [num for num in ticket if num in winning_draw])

Now that we have a way of checking played tickets against winning draws, we can simulate a drawing.

Simulating my chances

Let's make multiple drawings, and see how often a given birthday ticket would win. There are multiple ways to win, so I'll show how many times one number matched, how many times two numbers matched, and so forth.

The NC lottery lets you play a ticket in just one drawing, or keep playing the same number for up to 30 drawings. I was only buying one ticket, so I chose to play it for the full 30 drawings, just to extend the fun for a while. To model this, we'll generate one ticket, and then check that same ticket against a series of drawings.

Here's a simulation of playing a single "birthday ticket" in 30 consecutive drawings:

from random import randint

def get_ticket(ticket_size=5, max_value=69):
    ...

def check_ticket(ticket, winning_draw):
    ...

bd_ticket = get_ticket(5, 31)
print(f"Birthday ticket: {bd_ticket}\n")

# Simulate a number of drawings.
results = {0:0, 1:0, 2:0, 3:0, 4:0, 5:0}
for _ in range(30):
    winning_draw = get_ticket()
    matches = check_ticket(bd_ticket, winning_draw)
    results[matches] += 1

# Show results.
for num, num_matches in results.items():
    print(f"Match {num}: {num_matches}")
nc_powerball.py

We first generate a "birthday ticket", assigned to bd_ticket. We then set up an empty dictionary called results. The keys represent the number of matches, and the values are the number of tickets with that many matches. For example, if you want to know how many tickets matched three numbers from the winning draw, you'd use the code results[3].

We then make a loop that runs 30 times. On each pass through the loop we draw a set of winning numbers, assigned to winning_draw. We get the number of matches for that drawing by calling check_ticket(), using the same ticket (bd_ticket) for each drawing. If there are no matches, we increment results[0]. If there's one match, we increment results[1]. This is represented generally by incrementing results[matches].

Finally, we show the results by looping over the results dictionary, printing the number of matches for each key in the dictionary. Here's a sample run:

$ python nc_powerball.py
Birthday ticket: [3, 6, 12, 16, 24]

Match 0: 25
Match 1: 5
Match 2: 0
Match 3: 0
Match 4: 0
Match 5: 0

In this example, which was typical over multiple runs, there were 25 drawings where bd_ticket matched no numbers at all. There were 5 drawings where one of the numbers from bd_ticket matched.

In the actual NC Powerball lottery, it's likely that none of these 5 tickets would have won anything at all. You get a small prize ($4) for matching one number, but only if you also match the Powerball. The Powerball can be any number from 1 through 36. I'm not going to model the Powerball, but it would reduce the number of winners at the "Match 1" level by a factor of 36.

Prize table for NC Powerball lottery. Most relevant items from table discussed in text.
You can win some money without matching the Powerball, but most prizes are much smaller if you don't match the Powerball. The only meaningful prize without matching the Powerball comes from matching all 5 numbers.

You get something for matching 3 and 4 numbers without matching the Powerball, but not much. For example you'd win $100 for matching 4 numbers without the Powerball, but you'd get $50,000 for matching 4 numbers and the Powerball.

The only significant outcome without matching the Powerball is for matching 5 numbers. If you match 5 without the Powerball, you get $1 million. Matching 5 with the Powerball wins the jackpot, which is currently almost $500 million.

Birthday tickets vs Quick Pick tickets

Now let's go back to the original question that inspired this whole post:

Does playing a "birthday ticket" reduce your chances of winning when there are more than 31 choices for each number in the ticket?

Let's find out by making the same birthday ticket we've been making, but also make a Quick Pick ticket. We'll then print the results for both tickets, and see if there's a significant difference:

...
bd_ticket = get_ticket(5, 31)
quick_pick = get_ticket()
print(f"Birthday ticket: {bd_ticket}")
print(f"Quick Pick: {quick_pick}")

# Simulate a number of drawings.
bd_results = {0:0, 1:0, 2:0, 3:0, 4:0, 5:0}
qp_results = {0:0, 1:0, 2:0, 3:0, 4:0, 5:0}
for _ in range(30):
    winning_draw = get_ticket()

    matches = check_ticket(bd_ticket, winning_draw)
    bd_results[matches] += 1

    matches = check_ticket(quick_pick, winning_draw)
    qp_results[matches] += 1

# Show results.
print("\n\t\tBirthday\tQuick Pick")
for num in bd_results.keys():
    print(f"Match {num}:\t{bd_results[num]}\t\t{qp_results[num]}")

The logic here hasn't changed significantly from the previous listing. We make two tickets instead of one. We make two dictionaries for results as well, bd_results and qp_results. In the loop, we check the number of matches for each ticket, for every drawing in the loop. When it's time to show the results, we create a table showing how many matches each ticket has.

Here's the output for the first run:

$ python nc_powerball.py
Birthday ticket: [3, 10, 17, 20, 22]
Quick Pick: [16, 27, 38, 55, 60]

		Birthday	Quick Pick
Match 0:	21		20
Match 1:	7		10
Match 2:	2		0
Match 3:	0		0
Match 4:	0		0
Match 5:	0		0

With only 30 draws, there doesn't seem to be a whole lot of difference between the two tickets. If we took into account the Powerball, it's likely that none of these tickets would have won anything at all.

Simulating many drawings

Now let's increase the number of drawings, and see if the pattern holds for larger runs. I'm going to increase the number of drawings until the program starts to slow down on my system.

At about one million drawings, the program takes just over 2 seconds to run on my M2 MacBook Air:

$ time python nc_powerball.py
Birthday ticket: [15, 16, 25, 27, 30]
Quick Pick: [25, 31, 42, 46, 54]

Results for 1,000,000 drawings:
            Birthday    Quick Pick
Match 0:    678,602     678,211
Match 1:    282,430     282,665
Match 2:    37,188      37,288
Match 3:    1,744       1,808
Match 4:    36          28
Match 5:    0           0

python nc_powerball.py  2.32s user 0.02s system 96% cpu 2.420 total

I modified the code slightly to show the number of drawings, and format the results better. These results are enough to convince me that limiting your choices to numbers that appear in birthdays doesn't actually affect the chances of winning. If there was a difference, I'd expect it to show up in a simulation with this many drawings. In particular, I'd expect to see a significant difference in the number of tickets matching 1 through 4 numbers.

What about the ticket [1, 2, 3, 4, 5]?

I like how modeling real-world situations often lets you explore new questions that come up as you're working. One of the things I started wondering about involved taking this idea of reducing the numbers you choose from to the extreme: What if you only played the numbers 1, 2, 3, 4, 5? I expect many people would think "That ticket would never win, the lottery would never come up with those five numbers!" But thinking about it rationally for a moment, those numbers seem just as likely to win as any other set of five specific numbers.

We can answer this question by changing one line of code:

bd_ticket = [1, 2, 3, 4, 5]

Instead of calling get_ticket() to get a random ticket based on birthdays, we assign the list [1, 2, 3, 4, 5] to bd_ticket. Here's an example set of results with this one change:

$ python nc_powerball.py
Birthday ticket: [1, 2, 3, 4, 5]
Quick Pick: [19, 30, 36, 39, 46]

Results for 1,000,000 drawings:
		    Birthday	Quick pick
Match 0:	675,142		674,013
Match 1:	285,231		285,721
Match 2:	37,716		38,376
Match 3:	1,887		1,852
Match 4:	24		    38
Match 5:	0		    0

It turns out the ticket [1, 2, 3, 4, 5] doesn't perform any worse than any other ticket over a million drawings. It doesn't necessarily make intuitive sense right away, but it makes sense mathematically.

Conclusions

I'm not a fan of gambling, but it seems to be part of human nature in some ways. A little familiarity with programming concepts lets you model many lotteries and gambling-related games, which can give you an empirical sense of how things work. People's intuitions about random events aren't always accurate, which is part of why lotteries and casinos make so much money. Checking your intuitions by modeling games of chance can be a great way to find out whether your intuition about a particular game is accurate or not.

This was quick exploratory code. If you find this investigation interesting, there are many ways it could be refactored and optimized. For example get_ticket() can be reduced to one line of code using the sample() function from the random module. I don't think that approach makes the code more efficient, but it does make it shorter. I'm sure using sets, tuples, and arrays instead of lists would start to improve the performance of the code. If you're interested in improving this code, make sure you do some profiling so you're focusing on the sections of code that are actual bottlenecks.

One last note, naming things really is hard sometimes! I was using the name my_ticket for a while, before coming up with the more specific name birthday_ticket (and the shorter bd_ticket). Also, I was using winning_ticket for a while, but kept feeling like there were too many things called "ticket". I realized I was talking about drawings, and started using winning_draw instead. These are small changes, but they affect the way we think about what we're modeling. If you come up with a more specific name for something in your code, don't be afraid to take the time to switch to the better name. That's an especially good habit if you're in the early stages of a project, when it's easier to make those kinds of changes.

Resources

You can find the code from this post in the mostly_python GitHub repository.