Debugging in Python, part 14: Resolving recursion issues
MP 158: Fixing a recursion bug that only shows up after many consecutive games.
Note: This post is part of an ongoing series about debugging in Python. The posts in this series will only be available to paid subscribers for the first 6 weeks. After that they will be available to everyone. Thank you to everyone who supports my ongoing work on Mostly Python.
In the last post, we finished implementing the basic logic to run through an entire Go Fish game. At the end of a game, we let the user choose whether to play again. There's one last significant bug that I've become aware of, which only shows up after a number of consecutive games. In this post, we'll dig into this cross-game bug. We'll add an option to automate game play in order to help replicate, diagnose, and address the bug.
An automated player
When playing a bunch of partial games during development work, I noticed longer tracebacks than I'd expect to see from quitting mid-game, or quitting a debugging session. I wanted a way to automatically play through an entire game, and see the state of the game after a full game or several full games. I was curious to see if these tracebacks grow as the number of games played increases. I was also curious to see if the program would crash after a certain number of consecutive games.
One approach that helps with debugging is to automate your program's inputs in a way that brings you to the point of execution that you want to focus on. This kind of work also tends to make writing tests easier. There are only two kinds of inputs during an active Go Fish game. The player makes guesses, and they press Enter after pauses. We can automate an entire game by guessing for the player, and skipping all pauses during a game.
A more formal CLI
I used some simple sys.argv values to implement a minimal CLI in earlier posts, but now I want some formal named CLI arguments. I want to be able to automatically play through one game, or an arbitrary number of games. So I want to be able to run the game in these two ways:
$ python go_fish.py --auto-play $ python go_fish.py --auto-play --num-games 10
The first will automatically play through a single game, and then ask whether you want to play again. The second will automatically play through 10 games, and then ask if you want to play again.
In a small project like this, it's pretty straightforward to add a CLI by adding a cli.py file that looks like this:
import argparse parser = argparse.ArgumentParser() parser.add_argument( "-v", "--verbose", action="store_true", help="Verbose output, including showing computer cards.", ) parser.add_argument( "--seed", action="store_true", help="Seed the random number generator.", ) cli_args = parser.parse_args()
This file defines two arguments. There's the -v flag that we've been using to show the computer's hand and avoid clearing the screen on each turn. Then there's the --seed argument that makes a repeatable game by seeding the random number generator.
The most important line here is the last one. This line parses the CLI arguments, and stores all the options for the current run of the program in cli_args. We can import this anywhere we want in the project, and we'll have access to all the CLI arguments for any given run of the game. There's no need to pass cli_args around to different methods and modules; we'll just import it at the top of any file that needs it, and then use cli_args.<option_name> wherever we need to access an option.
For example, here's what the clear_terminal() function looks like with cli.py in use:
... from cli import cli_args ... def clear_terminal(): """Clear the terminal.""" # Don't clear terminal in verbose mode. if cli_args.verbose: return if sys.platform == "win32": subprocess.run("cls") else: subprocess.run("clear")
We're not importing the whole cli module; all we need access to is cli_args. The check for if "verbose" in sys.argv has been replaced by the simpler boolean check if cli_args.verbose. Other checks for sys.argv values have been updated to refer to cli_args options as well.
Automating the game
Now we can automate the game. The first step is to add the two necessary CLI arguments:
... parser.add_argument( "--auto-play", action="store_true", help="Automatically play through games.", ) parser.add_argument( "--num-games", type=int, default=1, help="When using --auto-play, how many games to play.", ) cli_args = parser.parse_args()
The --auto-play argument is an optional boolean flag, which stores True when it's included, and defaults to False when it's omitted. The --num-games flag defaults to 1.
We need to track the number of games, and modify the code that handles a finished game:
... from cli import cli_args class GoFish: def __init__(self): # Set a random seed if requested. if cli_args.seed: random.seed(42) self.completed_games = 0 ... def check_finished(self): """Check if the game is over.""" # If both hands have at least one card, game is not over. if self.player_hand.cards and self.computer_hand.cards: return # Game is complete. self.completed_games += 1 # Announce the winner. if not self.player_hand.cards and not self.computer_hand.cards: msg = "\nTie game!\n\n" if not self.player_hand.cards: msg = "\nYou won!!!\n\n" else: msg = "\nThe computer won. :/\n\n" go_fish_utils.pause(msg) # If auto-playing, play through specified number of games. if cli_args.auto_play and self.completed_games < cli_args.num_games: self.start_game() # Play again? play_again = input("Do you want to play again? (y/n) ") ...
We import cli_args in go_fish.py as well. When starting the program, we initialize self.completed_games to 0. When a game is finished, we increment the value of self.completed_games. Finally, before prompting about whether to play again, we check if --auto-play has been passed, and if we've completed the requested number of games. If we haven't, we skip the prompt and go straight to calling start_game() again.
In go_fish_utils.py, we make a few changes:
... import random def get_player_guess(player_hand): """Get a valid guess from the player.""" if cli_args.auto_play: return random.choice(player_hand.cards).rank ... def clear_terminal(): """Clear the terminal.""" # Don't clear terminal in verbose mode, or auto-play. if cli_args.verbose or cli_args.auto_play: return ... def pause(message): """Pause for player to see an update.""" if cli_args.auto_play: print(message) return message += " Press Enter to continue." input(message)
In get_player_guess(), we return a random rank from the player's hand whenever it's the player's turn. We don't clear the terminal during automated games, and we skip all calls to pause() as well.
Now when you run go_fish.py, you can skip right to the end of a complete game:
$ python go_fish.py --auto-play ... The computer asked for a 6. The computer's guess was correct! Player pairs: 2 Computer pairs: 5 Player hand: 4♣ 8♣ A♥ Computer hand: The computer won. :/ Do you want to play again? (y/n)
This is really helpful. You can also use the --num-games flag, but we'll focus on that in the next section. You can see this version of the project here, with the full argparse-based CLI.
A repeatable bug
As always when debugging overall game play, I want to make a bug repeatable. For this bug, I don't just want to make the game crash reliably, I want it to crash reliably under the same exact conditions.
An unusually long traceback
Let's see how this bug first surfaced. A common workflow during development is to play through part of a game, examine the state, and then cancel the run by pressing Ctrl-C. Here's what that looks like:
$ python go_fish.py ... Player pairs: 4 Computer pairs: 1 Player hand: 2♠ 10♠ A♦ Computer hand: X X X X X X X What card would you like to ask for? ^C Traceback (most recent call last): File "go_fish.py", line 169, in <module> gf_game.start_game() File "go_fish.py", line 40, in start_game self.player_turn() File "go_fish.py", line 49, in player_turn self.check_player_guess(requested_card) File "go_fish.py", line 77, in check_player_guess self.computer_turn() File "go_fish.py", line 85, in computer_turn self.check_computer_guess(requested_card) File "go_fish.py", line 120, in check_computer_guess self.player_turn() ... File "go_fish.py", line 47, in player_turn requested_card = go_fish_utils.get_player_guess( self.player_hand) File "go_fish_utils.py", line 16, in get_player_guess requested_card = input(msg).upper() KeyboardInterrupt
I played a few rounds of guesses, and then pressed Ctrl-C. Pressing Ctrl-C raises a KeyboardInterrupt exception, and usually results in a traceback. This was only part of the traceback; I cut out several repetitions of the same block of lines, hopping between lines 49, 77, 85, and 120. There were five references to line 49 in player_turn(), and four references to line 85 in computer_turn().
That doesn't really look right. A traceback should show recent calls to a function like player_turn(), but it shouldn't show every call to that function that's been made during the program's execution.
I'm pretty sure playing through some complete games and then quitting would show an even longer traceback:
$ python go_fish.py --auto-play --num-games 10 ... The computer won. :/ Do you want to play again? (y/n) ^C
The --auto-play flag worked; I saw output from ten full games. Instead of entering n, I hit Ctrl-C to cancel the run. I got a traceback that was just shy of 2,000 lines! It included 160 references to line 49 in player_turn(), and 157 references to line 85 in computer_turn()!
I've also hit a recursion error at times, which I assume is related to the extremely long tracebacks I'm seeing when pressing Ctrl-C. I think I can recreate that by auto-playing enough games:
$ python go_fish.py --seed --auto-play --num-games 15 Player pairs: 13 Computer pairs: 11 Player hand: Traceback (most recent call last): File "go_fish.py", line 169, in <module> gf_game.start_game() File "go_fish.py", line 40, in start_game self.player_turn() File "go_fish.py", line 49, in player_turn self.check_player_guess(requested_card) ... File "card_models.py", line 54, in show hand_string = " ".join([str(card) for card in self.cards]) RecursionError: maximum recursion depth exceeded
I'm using the --seed argument to make this bug repeatable, but a sufficiently high number of games will generate the same exception for all runs. With --seed, playing through 15 games is enough to generate the recursion error. Even if you're not going to play 15 games in a row, the program is clearly doing more work than it should during every game. This is also a good example of a problem that might never cause a crash during local development, but would absolutely show up if you put it into production and many people used the project in a variety of ways.
This traceback is just over 3,000 lines long, and includes 248 calls to player_turn()!
Diagnosing the recursion error
To diagnose this error, let's first look at the lines that repeat in the traceback. The problematic lines are 49, 77, 85, and 120. We should see a recursive pattern that needs to be made non-recursive.
Here's line 49, in player_turn():
def player_turn(self): ... self.check_player_guess(requested_card)
Okay, that's a line calling check_player_guess(). Let's see that method:
def check_player_guess(self, guessed_rank): computer_ranks = [c.rank for c in self.computer_hand.cards] if guessed_rank in computer_ranks: ... self.player_turn() else: ... self.computer_turn()
Line 77 is the call to computer_turn(). We'll look there next, but I also notice that the if block, which isn't executed in the current sequence, would recursively call back to player_turn().
Here's computer_turn():
def computer_turn(self): ... self.check_computer_guess(requested_card)
The line that matters here is line 85, the call to check_computer_guess():
def check_computer_guess(self, guessed_rank): ... if guessed_rank in player_ranks: ... self.computer_turn() else: ... self.player_turn()
Line 120 calls back to player_turn(), which brings us back to the start of this recursive loop. I also noticed that the if block would have called computer_turn(), which would branch into its own little recursive chain of method calls.
Fixing a recursion bug
My takeaway from this diagnostic exercise is that this is a case of accidentally applying a recursive approach where it isn't appropriate. When we start a game, we do some setup work and then call player_turn(), which makes sense. But we never go back to an overall game management method; the entire game proceeds as a highly nested group of calls back and forth between player_turn() and computer_turn(), with some calls to a few helper methods mixed in.
If you're curious to try coming up with a fix yourself, this is a good place to stop reading and try your own hand at fixing the issue. You can grab a copy of the project, and run it with this command:
$ python go_fish.py --seed --auto-play --num-games 15
You should see a RecursionError, with a really long traceback. See if you can get this to run without generating a recursion error. Then try an even longer set of games, such as --num-games 100. I think an actual fix should allow a fairly large number of consecutive games.
The main way to fix this is to make player_turn() return a result, rather than directing the ongoing flow of the game. We can do this by having start_game() call a new method that manages the overall game, rather than calling player_turn() directly. The new method, which I'll call manage_game(), will direct the overall flow of the game. It will call player_turn() and computer_turn(), which should remove most or all of the recursion issues.
Managing turns
Managing a game like Go Fish centers on keeping track of whose turn it is. In a game that's only ever going to involve two players, you can use a single boolean flag. We'll set self.player_active = True to indicate it's the player's turn, and toggle that flag to False when it's the computer's turn.
Here's manage_game():
def manage_game(self): # Manage the overall flow of a single game. self.game_active = True self.player_active = True while self.game_active: if self.player_active: self.player_turn() else: self.computer_turn()
We start with an active game, and again let the player go first. The game itself is managed by a while loop that does one of two things. If it's the player's turn, it calls player_turn(). If it's the computer's turn, it calls computer_turn().
That should be it! We need to make some other changes to use this overall manager, but that should be all there is to managing a game.
Now let's use this method:
... class GoFish: ... def start_game(self): ... self.manage_game() def manage_game(self): ... def check_player_guess(self, guessed_rank): ... if guessed_rank in computer_ranks: ... # Player gets to go again. go_fish_utils.pause("\nYour guess was correct!") else: ... self.player_active = False def check_computer_guess(self, guessed_rank): ... if guessed_rank in player_ranks: ... # Computer gets to go again. go_fish_utils.pause("\nThe computer's guess was correct!") else: ... self.player_active = True
After the setup work in start_game(), we call manage_game() instead of calling player_turn() directly. There are no changes needed to player_turn() or computer_turn().
In check_player_guess(), we don't need to call player_turn() in the if block that manages a correct guess. As long as we don't change the value of self.player_active, manage_game() will keep doing the work of calling player_turn(). When the player makes a wrong guess, the else block no longer calls computer_turn() directly. Instead, it toggles self.player_active to False, which causes the while loop to start giving the computer its turns.
We make similar changes to check_computer_guess(). We remove the direct call to computer_turn() in the if block, and toggle the value of self.player_active when the computer makes an incorrect guess.
Does it work? Let's try running 15 consecutive games:
$ python go_fish.py --seed --auto-play --num-games 15 ... You won!!! Do you want to play again? (y/n)
All right, the recursion error we were seeing from nested calls to player_turn() and computer_turn() has gone away. We're still calling those functions a bunch of times, but they're finishing their work and returning control back to manage_game() every time they're called. They're no longer calling methods that call other methods in an endless cycle.
How many games?
If the issue is truly fixed, we should be able to play 100 games:
$ python go_fish.py --seed --auto-play --num-games 100 ... You won!!! Do you want to play again? (y/n)
That worked, which means the recursion issue has clearly been improved.
However, I found the limit to the number of games a lot lower than I expected. I don't see any reason we can't play through a thousand or a million games, other than maybe waiting a while for all the printed output to scroll by. But I found I couldn't play through more than about 225 games. Here's what happens when I try to play 1,000 games:
$ python go_fish.py --seed --auto-play --num-games 1_000 ... File "go_fish.py", line 165, in check_finished File "go_fish.py", line 39, in start_game File "go_fish.py", line 48, in manage_game File "go_fish.py", line 59, in player_turn ... RecursionError: maximum recursion depth exceeded
It's another really long traceback, but this is the cycle that keeps being repeated. This is another recursion error, this time because check_finished() keeps calling start_game(), which calls manage_game(), which ends up calling check_finished(), with execution eventually flowing back to start_game().
The final recursion fix
The fix for this kind of recursion issue is, again, to let a method finish its work and return control to the method that called it. In this case, we need to let check_finished() just determine whether another game should be played, without having to start that game. It needs to make that determination, and then be done. We'll use the self.game_active flag we initialized in manage_game(), and we'll put a loop in start_game() to control the overall series of games.
Here are the changes that need to be made to check_finished():
def check_finished(self): """Check if the game is over.""" # If both hands have at least one card, game is not over. if self.player_hand.cards and self.computer_hand.cards: return False # Game is complete. self.game_active = False self.completed_games += 1 # Announce the winner. ... # If auto-playing, keep playing through specified number of games. if cli_args.auto_play and self.completed_games < cli_args.num_games: return True # Play again? play_again = input("Do you want to play again? (y/n) ") if play_again.lower() in ("y", "yes"): return True else: print("Thanks for playing!") sys.exit()
The main changes are that check_finished() needs to return False if the game is not finished. It needs to return True if the game is finished, and we want to run another game. Finally, it should call sys.exit() if the game is finished and we don't want to start another game.
The methods that call check_finished() need to use these new return values:
def player_turn(self): """Manage the human player's turn.""" go_fish_utils.clear_terminal() self.show_state() if self.check_finished(): return requested_card = go_fish_utils.get_player_guess( self.player_hand) self.check_player_guess(requested_card) def computer_turn(self): """Manage the computer's turn.""" go_fish_utils.clear_terminal() self.show_state() if self.check_finished(): return requested_card = random.choice(self.computer_hand.cards).rank self.check_computer_guess(requested_card)
In player_turn() and computer_turn(), the method should stop and return control to manage_game() whenever the game is finished.
Finally, start_game() needs a loop to repeat games as long as the overall program is running:
def start_game(self): while True: # Shuffle, and deal two hands. self.deck = Deck() self.deck.shuffle() ... self.manage_game()
With these changes, I was able to auto-play through 5,000 games. The output is a little slow, but I'm pretty confident you could play through a million games if you suppressed the output, or waited for it all to scroll by. There might be memory management issues I haven't thought of, but that's beyond the scope of this series.
Conclusions
The main takeaway from this work, as it relates to debugging, is that sometimes you do have to analyze the program's overall logical flow from the information in a traceback in order to diagnose a bug. Examining the state of the game's variables across many games would probably not have helped here. All the values were correct; it's the overall management of the logical flow of the program that was problematic. Diagnosing this requires some understanding of how the Python interpreter manages execution of a program, and the idea that a function or method can "finish" its work by fully returning control back to whatever part of the program called the function.
If you're unclear about what causes recursion issues like this, most AI assistants are pretty good at clarifying the problem if you give them relevant parts of a traceback like the ones in this post, and relevant parts of the codebase.
Personally, I'm reminded to be careful about recursion. I'm reminded that methods and functions should finish their work and return control to the caller, rather than taking over control of the overall flow of the program. We should only deviate from this pattern if there's a specific reason to enter into a recursive flow. When I do choose recursion, I should be able to articulate when that recursive cycle should finish its work, and how it should return control back to the original caller. We should never just use a recursive call because it happens to work, and hope we don't run into a recursion depth issue at some point.
If I were taking this project further, I'd make some name changes at this point as well. I'd use names like manage_player_turn() and manage_computer_turn(). That would let me use a flag like player_turn to indicate it's the player's turn, rather than having to use the name player_active. I'd also change start_game() to a more generic start_play(), because start_game() is doing more than just starting a single game. It's starting the overall game play, across a series of games.
The next post will close out this series. We'll draw some larger conclusions about debugging overall, and I'll offer some suggestions for how you might take this project further if you've found it interesting.