Using `any()`

MP 132: It's a simple built-in function, but using it isn't as straightforward as it might seem.

A task that's come up repeatedly in my programming work involves looking through a collection of values, and taking an action if a specific item appears anywhere in that collection. There's a way to do this in one line of code, but it almost always takes me at least one round of refactoring to get there.

When I had to do this most recently, I made a note to write it up and try to recognize the pattern next time it comes up. It feels like I should be able to use the one-liner approach without always going through a round of refactoring. It turns out using any() in the real world is a bit more complicated than it appears on the surface.

When in is enough

Python makes it quite straightforward to determine if a specific item appears in a collection. Consider the following list of fruit:

>>> fruits = ["pear", "apple", "orange"]

If you want to know whether "apple" is in the list, you can use the in keyword:

>>> "apple" in fruits
True

This works for simple cases where the item you're looking for is either in the collection or not in the collection. It doesn't work when you need to examine something about each item in the collection.

When in is not enough

Now let's consider a collection of desserts:

desserts = ["tiramisu", "apple pie", "pecan pie", "chocolate cake"]

I love apple desserts, but I'm not particular about what kind of dessert it is. If there are apples in the dessert, I'll try it!

A simple in test doesn't work here, because we don't know exactly what item to look for in the collection. We need to examine each dessert in the list, and see if it involves apples:

eat_dessert = False

for dessert in desserts:
    if "apple" in dessert:
        eat_dessert = True
        break

We first set the default condition: I don't plan to eat dessert. Then we look through all the desserts, one at a time. If any of them have apple in them, I'll eat dessert. Once I know there's a dessert with apples, I don't need to look at the rest of the options, I already know I'm eating dessert.

I tend to write code like this initially because it models how I think in the real world. But it's not the best implementation to keep in a codebase.

Using any()

There's a way to implement the above check in one line of code:

eat_dessert = any(["apple" in dessert for dessert in desserts])

The any() function returns True if any item in the collection evaluates to True. Here if apple appears in any of the dessert names, any() will return True. Otherwise, it will return False.

A closer examination of this code clarifies why it often takes me a round of refactoring to get here. It's not as simple as telling people to just use any(). In the example of looking through a list of desserts, you have to first make a list comprehension that defines the condition you're looking for.

It's helpful to look at that list on its own:

>>> ["apple" in dessert for dessert in desserts]
[False, True, False, False]

The comprehension looks at each item in dessert, and evaluates to True if apple is in the dessert name, and False if it's not. We end up with a list of boolean values.

That's the list that any() is acting on:

>>> any([False, True, False, False])
True

So, again, it's not as simple as "just use any()". You need to make an intermediate sequence of boolean values representing the condition you're interested in, and then call any() on that sequence. This is quite readable once it's written, but it doesn't model how most of us think in the real world.

any() accepts a generator

I passed a list comprehension to any() in the previous section, because it's a bit more intuitive to think about any() acting on a list of values. But you don't need the full comprehension; you can just pass the expression that's used inside the comprehension directly to any():

eat_dessert = any("apple" in dessert for dessert in desserts)

I believe this version should be more efficient, as it should stop at the first item in the sequence that meets the specified condition. If you pass a comprehension, I believe the entire list will be built before any() is executed.

Real-world context

The test suite for django-simple-deploy is somewhat complicated, because a full end-to-end test makes an actual deployment to a hosting platform. These tests are critical for ongoing development and maintenance of the project, but they should only be run when they're explicitly requested.

When building out the infrastructure to support a plugin ecosystem last fall, the test suite grew even more complicated. The test suite needed to collect all tests from both the core django-simple-deploy project, and any plugins that are installed.

A bare pytest call should run all unit and integration tests, but no e2e tests. When an e2e test is requested, it should only run that test. Here's an example call for an e2e test:

$ pytest tests/e2e_tests --plugin dsd_flyio -s

This commands says to run the end-to-end test using the plugin dsd-flyio, and show the output that's generated during configuration and deployment of the sample test project.

This kind of custom test collection behavior can be implemented by examining pytest's config object. Here's the specific condition I was trying to implement:

If any argument includes the phrase "e2e_test", don't collect tests from plugins.

Here's how I implemented that behavior originally, in the root conftest.py:

def pytest_configure(config):
    """Add plugin test paths to what's being collected."""

    # Don't modify test collection when running e2e tests.
    for arg in config.args:
        if "e2e_tests" in arg:
            return

    if config.option.skip_plugin_tests:
        return

    # Collect tests any installed plugin.
    ...

config.args is a list of arguments that were passed at the command line. If we loop over all the arguments that were passed in the pytest call, and find the string e2e_tests in any of those arguments, we should exit the custom configuration function before collecting any tests from plugins.

This worked, and it was a straightforward way for me to think about what I was trying to do. But this is an example of code that's harder to read than it is to write. There are multiple levels of indentation inside the already-indented function body. You also have to think through the outer loop, and then the inner condition.

Here's what the refactored version looks like:

def pytest_configure(config):
    """Add plugin test paths to what's being collected."""

    # Don't modify test collection when running e2e tests.
    if any("e2e_tests" in arg for arg in config.args):
        return

    if config.option.skip_plugin_tests:
        return

    # Collect tests any installed plugin.
    ...

This version is more readable because it has fewer levels of indentation, but also because the logic is more clear. The block starts with an if statement, which is more important than the presence of the for loop. The two-line block here reads more naturally than the original version: If e2e_tests appears in any argument in config.args, return now.

Conclusions

It's tempting to think I should have known how to use any() with more confidence earlier in my career as a programmer. But a bit of reflection shows that there's more involved in using this function than just understanding what it does on a surface level. Sure, it returns True if any item in the sequence passed to it evaluates to True. But to actually use it, you need to generate a sequence of boolean values based on the data you're working with, and the conditions you're looking for.

It's worth being thoughtful about the things we assume are simple. They might be simple because we've been using them for a while, and we've lost sight of some of the complexity that goes into using them in real-world situations. Also, if you've thought less of your own skills because you don't use some "simple" features of the language, consider that "simplicity" is often very context-dependent, and experience-dependent as well.

All that said, Python offers a bunch of features that make common situations easier to address. You can make yourself aware of these features before you need them, so you'll know what to reach for. But actual fluency with these features will probably take some time, practice, and ultimately reflection. Use refactoring sessions, and an intentionally designed test suite to explore other ways to implement verbose blocks in your real-world codebases.