grep is your friend
MP 153: If it's not already, it should be. :)
Note: I will get back to the debugging series as soon as I can. Work has been busy lately, and it's much easier to write about things that are coming up in my regular work at the moment. That series should be finished by the end of the year!
In the 1970s, a small group of people did some foundational work that impacts the way we work with computers to this day. In particular, they wrote some tools that are still incredibly useful in today's world. They also came up with an interoperability concept that made those tools even more powerful than they could ever be on their own. In this post I'll focus on grep, and the concept of pipes.
Using grep
 grep looks through some text, compares it to a pattern you provide, and prints out anything from the text that matches that pattern.
I work on a few specific projects most of the time, but I create a bunch of small projects that tend to accumulate over time. Most of these projects live in a projects/ directory on my system. That directory currently has 133 folders in it. The ls command (dir on Windows) is great, but by itself it's not super helpful in this directory:
projects$ ls -a1 active_learning add_border_packaging_setup ai_community_presentation ... wallpaper_project ```
When working in a terminal, I often want to see a subset of these directories. For example, all the django-simple-deploy plugins I work on start with the prefix dsd-. When I work on django-simple-deploy I often want to focus on one of these plugins, and it's helpful to see all the plugins I have a local copy of.
Reading through 133 lines of output isn't fun, but this is where grep comes in:
projects$ ls -a1 | grep dsd- dsd-codered dsd-flyio dsd-flyio-nanodjango ... dsd-upsun dsd-vps
This is the same ls command, followed by a grep command. I gave grep the pattern "dsd-", which tells it to go through all the lines of output, and only print the ones that include the string "dsd-".
This output is much more useful, and it's often quicker than opening a Finder window and scrolling through the directories in a graphical interface.
The | operator
The overall command ls -a1 | grep dsd- contains a really important character. The | character is called a pipe. That original group of programmers in the 1970s came up with the idea of allowing any command to operate on the output of any other command. They realized this would allow a small set of simple tools to be useful across a wide range of use cases. I think this was a brilliant insight, and I like to compare it to the idea that just 26 letters in the English alphabet allow us to read all the books that have ever been written in English.
In this example, we say we're piping the output of ls to grep. What makes this even more powerful is that we can string together as many pipes as we want.
Rebranding: More than replacement
One of the reasons I've been busy is that the hosting company Platform.sh recently rebranded to Upsun, and that has affected a couple of my projects. Python Crash Course includes a section about deploying to Platform.sh, and I also maintain a django-simple-deploy plugin called dsd-platformsh that supported deployment to Platform.sh. I need to port that to a new plugin, dsd-upsun.
It's been a fair bit of work to figure out what's changed, what's stayed the same, and what the current best practices are on the rebranded platform. Practically, it's also meant a lot of searching. For refactors like this, grep is much nicer than using only the Find and Replace tools in an IDE or editor. I use Find and Replace when working in the main code files where I know I have to make changes. But where should I start? How will I know when I'm done? How can I be sure I haven't missed any lingering references to Platform.sh?
A really nice way to do this is to run grep on the entire project directory:
dsd-upsun$ grep -R platform.sh . ./.../test_plsh_utils.py:"""Unit tests for platform.sh utils.""" ./.../projects_info_output_csv.txt:region,us-3.platform.sh ./.../projects_info_output_csv.txt:repository,"url: '<id>@git.us-3.platform.sh:<id>.git' ./.../projects_info_output_csv.txt:subscription_management_uri: ... Binary file ./.../test_plsh_utils.cpython-313-pytest-8.4.1.pyc matches Binary file ./.../test_plsh_utils.cpython-312-pytest-8.3.4.pyc matches ...
grep can be run against a directory. Here we're running it recursively (-R), to look for the pattern "platform.sh", in the current directory (.).
Early on in the refactoring work, there were a whole bunch of references to Platform.sh. However, many of these were in binary files. I don't care about those; they'll refresh automatically next time I use the project.
grep feels like a sharp cutting tool, in a good way. The -v flag tells grep to exclude a pattern. We can use that flag, along with another pipe, to filter this output:
dsd-upsun$ grep -R platform.sh . | grep -v Binary ./.../test_plsh_utils.py:"""Unit tests for platform.sh utils.""" ./.../projects_info_output_csv.txt:region,us-3.platform.sh ./.../projects_info_output_csv.txt:repository,"url: '<id>@git.us-3.platform.sh:<id>.git' ./.../projects_info_output_csv.txt:subscription_management_uri: ... ./.../services.yaml:# See https://docs.platform.sh/... ./.../pipenv.platform.app.yaml:# See https://docs.platform.sh/... ./.../pipenv.platform.app.yaml: # https://docs.platform.sh/...
This compound command uses grep to find all references to platform.sh in the current directory, but then exclude (-v) any output that includes the string "Binary". I'm whittling away the output I don't care about, and focusing on the outdated references to "platform.sh" that I actually need to deal with.
The command that ended up being most helpful looked like this:
dsd-upsun$ grep -R platform.sh . | grep -v Binary | grep -v build | grep -v venv
I looked for "platform.sh" in the current directory, excluding anything in a binary file or in the build/ and .venv/ directories.
There's one more -v example that's worth sharing. In an earlier refactoring session, I was trying to change the name sd_command to dsd_command. In an editor, it might be challenging to search for sd_command without sifting through a bunch of instances of dsd_command. But with grep, -v makes it easy:
$ grep -R sd_command . | grep -v dsd_command
This finds all instances of sd_command, which will include dsd_command as well. But it then excludes all the instances of dsd_command. We're left with exactly what we want to focus on.
Per-directory searches
Most of my work was in two specific directories, dsd_platformsh/ and tests/. It was helpful at times to run grep against these specific directories, rather than against the entire project:
dsd-upsun$ grep -R platform.sh dsd_platformsh/ ... dsd-upsun$ grep -R platform.sh tests/ ...
It was helpful to run these searches, and then go back to the overall project as each main directory was cleaned up.
Filenames only
By default, when you run it against a directory, grep shows the path and the line of text that contains the pattern it's searching for. You can tell it to just show the filenames where the pattern appears using the -l flag:
$ grep -Rl platform.sh dsd_platformsh dsd_platformsh/platform_deployer.py dsd_platformsh/deploy.py dsd_platformsh/__pycache__/platform_deployer.cpython-312.pyc dsd_platformsh/__pycache__/platform_deployer.cpython-313.pyc ... dsd_platformsh/templates/platform.app.yaml dsd_platformsh/deploy_messages.py
And again, we can exclude these __pycache__/ entries:
$ grep -Rl platform.sh dsd_platformsh | grep -v pycache dsd_platformsh/platform_deployer.py dsd_platformsh/deploy.py dsd_platformsh/templates/services.yaml dsd_platformsh/templates/pipenv.platform.app.yaml dsd_platformsh/templates/poetry.platform.app.yaml dsd_platformsh/templates/platform.app.yaml dsd_platformsh/deploy_messages.py
This is fantastic, because it tells me exactly which files to go look at.
Releasing with confidence
When I make the 1.0 release of dsd-upsun, I want to be confident there aren't any lingering references to Platform.sh. I use my editor's Find and Replace to do most of the actual replacement and refactoring work, but before making a 1.0 release I really want to do a global search for any references I might have missed. It's not just the name Platform.sh that I'm looking for, it's any name related to that: platform_sh, plsh_utils, platformsh, and a few others.
With grep, it's easy to do some final checks:
dsd-upsun$ grep -R platform.sh . | grep -v Binary | grep -v build | grep -v venv .../deploy_messages.py: This will create a project in the us-3.platform.sh region. .../utils.py: create_cmd = f"upsun create ... --region us-3.platform.sh --yes"
These are legitimate references to Platform.sh. There are still some internal names that reference the earlier brand, and it's part of why this kind of refactoring work isn't as simple as just running a global Find and Replace.
Other checks pass more cleanly:
dsd-upsun$ grep -R plsh . | grep -v Binary | grep -v build | grep -v venv dsd-upsun$ grep -R platformsh . | grep -v Binary | grep -v build | grep -v venv dsd-upsun$ grep -R platform_sh . | grep -v Binary | grep -v build | grep -v venv
Seeing no output, or only a handful of correct references, is quite satisfying. It also gives me much more confidence that the 1.0 release won't have some accidental references to these terms that I didn't see when working in individual files.
What's in a name?
You might be wondering where the name grep came from. There's a great Computerphile interview with Brian Kernighan where he tells the story of how grep was written "overnight" by Ken Thompson after someone said they needed help searching through large amounts of text.
The approach Ken took was to make something that ran a global search using a regular expression, and then print the output of that search. He named the tool grep based on what it does, and how it does it.
Conclusions
Programmers in the early days were working under significant constraints. They had a small amount of memory, limited processing power, and not much prior art. They figured out how to build a small set of tools that could work together to accomplish so much more than any single tool could.
It's well worth your time to become familiar with some of these tools, and to look into the many options for how they can be used. You don't need to learn a bunch of tools, but figuring out the small handful that fit well with your workflows will give you more flexibility as a programmer, and more confidence in your work as well.
 
             
             
            