Cleaning up unused imports

MP 166: It's so much easier than it used to be!

When I'm refactoring a project, I often end up with a bunch of imports that are no longer needed. Cleaning up unused imports is often a low-priority task, because in most of the projects I work on they don't cause any issues at all; they just add a little clutter to the top of some .py files.

I know there are linters that can take care of this, but for many exploratory projects I haven't wanted to bother installing another tool to the project. A number of recent changes have made it much easier to use these tools without necessarily adding them to your project.

Current project: gh-profiler

I'm currently working on a small tool called gh-profiler, which aims to help open source maintainers decide how much time to invest in PRs and issues from new contributors. If a PR seems to be AI-generated, and you're wondering if it's worth investing much time in reviewing the PR, you can run the profiler and get a quick summary of relevant information about the GitHub user who opened the PR:

$ uv run main.py ehmatthes
GitHub user: ehmatthes
  🟢 Account age: 5057 days
  🟢 ehmatthes has opened fewer than 10 PRs in the last 21 days.

I shouldn't raise any immediate red flags when I submit a PR. I have a well-established GitHub account, and I haven't opened an excessive number of PRs recently.

Here's the output for a user who opened a PR on one of my projects recently:

$ uv run main.py <redacted>
GitHub user: <redacted>
  🟢 Account age: 2057 days
  🟢 42 of 69 PRs have been merged in the last 21 days.
  🟡 20 of 69 PRs have been closed without merging in the last 21 days.

This user has a well-established account, but they've been opening a lot of PRs lately. Many of those PRs have been merged, which is a green flag. But they've also had a significant number of PRs closed without merging recently. I've redacted the username, because I want to be really careful about criticizing individual GitHub users.

When I saw this user's PR, I went to their profile page and clicked around to see this kind of information manually. I ended up closing the PR because it was an AI-generated response to an issue I had opened minutes earlier. The PR was relevant, but it was an AI-generated contribution with no context, that didn't solve the issue. The issue required nuance and judgment, but a bot this user was running saw the help-wanted label and jumped on it. This is exactly the kind of quick PR that is more burdensome to project maintainers than helpful.

I've had many experiences like this recently, and have wanted to build a small tool that gives me this kind of information quickly. In the 1.0 version of this project you'll be able to run uvx gh-profiler <PR-number>, and see this kind of summary about the user who opened the PR. I might even look at adding it to a GitHub action, so the summary is posted as the first comment in every new PR and issue.

Initial refactoring of a new project

When I was writing the initial blank-screen version of this project, I just made some API calls through the gh CLI tool, and started processing the data that was returned. All the code was in a single file:

$ tree
├── gh_profiler.py
├── README.md

Once I saw that the project was going to be meaningful, I broke it up into a number of modules:

$ tree
├── main.py
├── profile_data.py
├── pyproject.toml
├── README.md
├── utils
│   ├── analysis_utils.py
│   ├── flags.py
│   ├── infra_utils.py
│   ├── profile_utils.py
│   └── summary_utils.py
└── uv.lock

This is a much better project structure! The main() function went from about 60 lines of code to this:

def main():
    # How old is the account?
    profile_utils.get_account_age()
    analysis_utils.process_account_age()

    # What does recent PR activity look like?
    profile_utils.get_pr_activity()
    analysis_utils.process_pr_activity()

    # Summarize findings.
    summary_utils.show_summary()
main.py

This is much more readable than the 60-line version. It's easy to add more analysis. If we need to adjust the current analysis we can go focus on individual functions that handle specific criteria about the user and their recent activity.

However, here's what my imports looked like for this file:

from datetime import datetime as dt
from datetime import timezone as tz
from datetime import timedelta
import subprocess
import sys
import shlex
import json

from profile_data import profile_data
from utils import profile_utils
from utils.infra_utils import run_cmd
from utils import analysis_utils
from utils import summary_utils
main.py

These are a mix of imports that are left over from when all the work was being done in this file; subprocess is an example of this. Some of these import new utility modules, such as profile_data.

Cleaning up imports

Historically, I've cleaned up import blocks like this by removing the ones I think are no longer needed, and then running the project to see if it still works. But that's a bit tedious, and I often miss some imports that weren't needed.

I know there are tools that automate this process, but when it's time to clean up imports I haven't always wanted to decide which tool to add to the project. These days, you don't have to choose! uv lets you easily run a tool one-off, before deciding to add it to the project:

$ uvx ruff check . --select F401 --output-format concise
main.py:21:34: F401 [*] `datetime.datetime` imported but unused
main.py:22:34: F401 [*] `datetime.timezone` imported but unused
main.py:23:22: F401 [*] `datetime.timedelta` imported but unused
main.py:24:8: F401 [*] `subprocess` imported but unused
main.py:26:8: F401 [*] `shlex` imported but unused
main.py:27:8: F401 [*] `json` imported but unused
main.py:31:31: F401 [*] `utils.infra_utils.run_cmd` imported but unused
Found 7 errors.
[*] 7 fixable with the `--fix` option.

This uses ruff to only look for unused import statements (formatting rule F401). I recognize all these imports, so I'd like to remove them. ruff can do that automatically with the --fix flag:

$ uvx ruff check . --select F401 --output-format concise --fix
Found 7 errors (7 fixed, 0 remaining).

And now the import block is cleaner, and I don't have to wonder if I missed anything:

import sys

from profile_data import profile_data
from utils import profile_utils
from utils import analysis_utils
from utils import summary_utils
main.py

Conclusions

I love how much easier it's gotten in recent years to use Python tools in whatever way suits your current workflow. Running uvx ruff worked for me today, because I didn't want to add ruff to the project quite yet.

That said, seeing this change is a reminder that I should start paying more attention to formatting and clean code, even in my projects that are still at the proof-of-concept stage. I'm noticing that I'm making a larger percentage of my projects public than I used to, and I want to put better-structured projects out there. It's no fun for other people to come into a messy project, where there are obvious things that should be cleaned up before much collaboration happens. I'll probably start using ruff from the beginning in all my projects now.

If you're on the fence about using any Python tools or projects, consider whether you can give those tools a quick one-off try like this. And if you think you might have a bunch of unused imports in a project, try that one-off uvx ruff command. Your project will thank you for it. :)