Searching for quick answers: Stack Overflow, ChatGPT, or other sites?

MP #6: How cleanly and directly do different sites present the correct answer?

People tend to think that because I’m the author of a popular Python book, I must know Python syntax really well. It’s a flattering myth, but I don’t live up to it. Like anyone, I know the syntax well for the parts of Python that I use on a regular basis. But for things I only do once in a while, I often have to look up the exact syntax. For example, it took me a long time to remember the syntax for accessing the index of each item in a list when using a for loop. I knew it involved the enumerate() function, but I could never remember the order of arguments when using enumerate().

Last night I was looking something like this up, so I went to Google and did a routine search. As I sifted through a couple pages of the sources that came up, I noted again how ugly so many pages have become due to ads, SEO optimization attempts, machine-generated copycat resources, and human-generated posts that are often so much longer than they need to be. You get blind to it over time, but I was looking for something very simple, short, and specific; when I saw the information I was looking for it struck me again how much extraneous and distracting things are on most web pages these days.1 Then I wondered how much different trying to answer this question using ChatGPT would feel. I’m not the biggest fan of “AI”, but I have to say it worked well for this purpose.2

After trying these different approaches, I thought it would be interesting to make a visual comparison of how information is laid out in each of these resources.

A sample program

To frame this discussion, let’s make a sample program. Say we’re building our own text editor, or IDE. We have a list with lines of code in it, and we want to display the lines of code. Here’s a simple loop over the list, which prints the lines of code:

lines = [
    '"""A simple Hello World program."""',
    "",
    'msg = "Hello Python world!"',
    'print(msg)'        
]

for line in lines:
    print(line)

Here’s the output:

"""A simple Hello World program."""

msg = "Hello Python world!"
print(msg)

The output is correct, but what if we want the listing to have line numbers? Each line number is just one more than the index of its corresponding line. If we had access to the index of each line as we work through the loop, displaying line numbers would be straightforward.

I’m going to put myself back in that space of not remembering the exact syntax for accessing the index inside the loop, and see how hard it is to find that information.

This should be something that’s not too difficult to look up. I should recognize the correct code when I see it, I don’t need a long explanation, and there should be a lot of resources that have this information. I’ll look up the phrase python loop get index of each item.3

The results from this search are pretty typical: some Stack Overflow questions, some blog posts, and a bunch of those sites that seem to wrap a couple pieces of information in a whole bunch of SEO blather and ads, or clearly repackage content from higher-quality sites.

I’m going to pick a few of these pages, and look at the kinds of information we find on each. Then we’ll compare these pages to what we see in a ChatGPT session that answers the same question.

Coding the pages

For each of the three pages under consideration, I’m going to take a screenshot and then cover every element on the page with a colored block. I’ll use three kinds of shading:

  • Blue sections contain relevant information. Darker blue represents the specific information I was looking for, showing the syntax for how to use the enumerate() function when looping over a list.
  • Pink sections cover distracting and irrelevant information. On many pages these are ads, but SEO blather falls into this category as well.
  • Gray sections cover neutral page elements, such as navigation and other elements that don’t help you evaluate the quality of the information.

For consistency, I’m just going to examine the part of each page that appears above the fold. We should be able to find very specific information like this without a whole lot of scrolling, and it makes the comparison across sites more consistent as well.

Looking at a Stack Overflow page

Here’s what Stack Overflow looks like when coded in this way:

A small rectangle near the middle is dark blue. About a third of the elements are light blue. About half are gray. There is one pink rectangle on the right.

On Stack Overflow, most of the visible information is relevant or neutral. It’s a fairly busy page, but since most of the information is relevant or neutral, it’s not a bad layout. The information I’m looking for is above the fold, and there’s a bunch of contextual information nearby.

There’s one prominent ad on the right, but even that’s not necessarily a bad thing. A single relevant ad is perfectly appropriate on an information source; I’ve found many quality products and services from well-curated ads in my life.

One of the many other sites that pop up

These days, almost every time I search on Google a bunch of pages show up that are similarly designed. They have relevant information, but it’s either scraped from higher-quality sites, or it’s written by humans but not very well. It’s either written poorly because the writer is not proficient at technical writing, or they’re intentionally writing for search engines instead of human readers.

I’ve blocked the URL and name from this image, because the particular source doesn’t matter. Even if you know how to avoid these kinds of sites, keep in mind that many people newer to programming don’t know how to filter out these sites right away. I’ve been searching for technical information for decades, and I still end up wading through sites like these on a regular basis.

Here’s what this kind of site looks like when coded:

There are gray rectangles at the top and left of the image. There are big red rectangles on the right, and in the middle. There are a few smaller light blue rectangles near the top and near the bottom.

A couple smaller sections are contextually relevant, but the specific information I’m looking for doesn’t even appear above the fold. There’s a bunch of ads on the right side, and an ad right in the middle of the page. The two horizontal pink rectangles in the middle aren’t ads; they’re poorly written text that’s either machine-generated, or SEO blather. To someone just learning Python, it might read like an attempt at explaining how to write loops in Python. But it mixes a number of different ideas, and is not presented in a way that leaves the reader clear about which approach is best for their given situation.

There are neutral elements on the left and top of the page, but it’s generous to call these neutral. They’re all related to Python and site navigation, but overall they’re a large number of poorly-organized links. It looks like a lot of information, but it looks more like SEO optimization than a helpful navigational structure for people who are trying to learn.

How does ChatGPT compare?

Here’s what a ChatGPT session looks like, with the same coding:

There are two gray rectangles on the left, and two at the bottom. There is one prominent dark blue rectangle near the top. There are a couple smaller light blue rectangles near the top, and one large light blue rectangle in the middle of the page.

This is really interesting to see. The dark blue box, representing the specific information I was looking for, is right near the top of the page. It’s got a small amount of contextually relevant information above, and a large block of relevant information below. The neutral elements are nicely spaced on the left and bottom, and there’s a non-distracting amount of neutral elements. It’s a clean page, which highlights the most important information. There are no ads, and no irrelevant information on this page.

Keep in mind that ChatGPT doesn’t always produce the same output for a given prompt, so your coding of a session for a similar question may look different than this.

What about blogs?

I’m not including blogs in this discussion, because blogs often serve a different purpose. Blogs tend to tell a story, or teach a larger concept, or explain a specific approach in more depth. I’m searching for a specific piece of syntax, and expect the answer to either come from a documentation source, or a question and answer site.

I’d expect a good blog to have a dark blue box, a large number of light blue boxes, and some gray boxes when coded in this way. I wouldn’t necessarily expect the dark blue box to appear above the fold on a blog, because blog posts often include some discussion before diving directly into syntax.

Finishing the IDE example

For people who aren’t familiar with enumerate(), here’s the original IDE example using the index value:

lines = [
    '"""A simple Hello World program."""',
    "",
    'msg = "Hello Python world!"',
    'print(msg)'        
]

for index, line in enumerate(lines):
    line_num = index + 1
    print(f"{line_num}\t{line}")

The enumerate() function takes in a list and returns a sequence of tuples containing an index and the corresponding item from the list.4 Instead of using a single loop variable, you use two variables. The first represents the index of each item, and the second refers to the value of each item in the list.

Here’s the output, a code listing with line numbers:

1   """A simple Hello World program."""
2   
3   msg = "Hello Python world!"
4   print(msg)

Discussion

There are a bunch of people proclaiming that ChatGPT, and its peers and descendants, are going to quickly make all kinds of other learning resources irrelevant. A surface reading of this discussion might reinforce that line of thinking. But there are some really important things to consider before jumping to that conclusion.

First, this was a simple search for something that has a clear answer. So, ChatGPT’s initial response was correct and uncluttered. The light blue box below the correct information was the output, and one alternate approach. However, that second example was the classic C-style approach, where you define your own index variable and use it to loop over the list. In the context of the IDE example above, that would look like this:

for i in range(len(lines)):
    line_num = i + 1
    print(f"{line_num}\t{lines[i]}")    

This is not idiomatic Python at all; it’s much less readable than the solution using enumerate(). ChatGPT’s initial response has no discussion of the appropriateness of either solution. It basically presents both of these solutions as equivalent. I’m curious if ChatGPT would express an opinion on this if asked, but I’ll save that discussion for a separate post.

Stack Overflow pages are busy, and that’s especially true on questions that have been asked many times over the years. Those questions tend to get a lot of answers, and a lot of comments. That’s not necessarily bad, though. Most of the light blue boxes on the Stack Overflow page are discussions of the other ways you can approach this problem, and when it’s appropriate to use the different solutions. This is really valuable information that’s entirely missing from ChatGPT’s initial response. Even if that information comes up in responses to followup questions, if you were newer to Python and using ChatGPT for a question like this, you’d need to know to ask clarifying questions. At this point ChatGPT can be thought of as an overly confident person responding to questions on Stack Overflow, always willing to walk their answers back if challenged.

We would all be better off if the SEO-optimized site, and sites like it, dropped off the internet entirely. They’re full of distracting information, and calling any of the material relevant is being generous. Much of the text on these kinds of pages is similar to the overconfident person answering questions on Stack Overflow, or ChatGPT spitting something out because it’s seen these words in its training data.

This brings up one more point about the color coding of the ChatGPT page. It comes across as very clean and non-distracting in this brief analysis, about a simple question with a relatively clear, correct answer. The coding of ChatGPT pages would be different for questions that have less clear answers and more possible solutions. But it will also change as ChatGPT becomes commoditized. Some people and companies will add more neutral elements around an “AI” chat assistant, and others will throw ads all over the screen. Ads could be inserted between paragraphs. And, much more disturbingly, ads could be inserted into the actual text of the responses. That would certainly be interesting to code in this way.

Conclusions

The one clear conclusion is that we still need to visit a variety of sources when we have questions, and validate much of what we find. Whether we prefer sites like Stack Overflow or known, trusted blogs, or ChatGPT and its peers, we have to evaluate whether we’re seeing correct information, or incorrect information stated confidently.

Some of the benefit of the initial offering of ChatGPT is simply the removal of a whole lot of clutter compared to what we’re used to seeing. If the cleanliness of that presentation influences other sites to reduce SEO blather and overuse of ads, that would be a nice side effect of having to deal with overconfident “AI” assistants.

ChatGPT and its peers are not going away. They’re a tool to be aware of, both for their strengths and their limitations. It will be interesting to watch how this tooling evolves, and how it impacts our workflows.

Has ChatGPT, or similar virtual assistants, changed the way you work as a programmer? If so, please share how your workflow is evolving.


Resources

You can find the code files from this post in the mostly_python GitHub repository.


  1. I was trying to parse the output from an online editor and generate a formatted email. I knew you could round individual corners of a box using CSS, but I didn’t know if the correct syntax was box-radius-top-left, box-radius-top, box-top-radius, or something else. My search was css round top corners only.

  2. I can’t help but put “AI” in quotes every time I use the term. Calling these tools artificially intelligent is more hype than an accurate description of their capabilities. Some of what comes out of them is interesting and impressive, but I’m not at all comfortable calling it “AI” without feeling like I’m contributing to a giant marketing campaign.

  3. I was a high school math and science teacher in the 2000s, and we taught all of our students how to develop effective search phrases back then. Students wanted to enter full questions, such as How do I get the index value for each item when using a for loop with a Python list? We had to tell them that kind of phrasing would just confuse the search engine. It’s kind of funny to look back and see how effective search engines have become, regardless of how much websites have degraded over this time.

  4. enumerate() can take in any sequence or iterable object. See the official documentation for a concise overview of what it can accept, and what it returns.