Revising the output of a coding assistant

Although I am sure using coding assistants today is bad for you, I have been trying them out. I think it's because even our hero Kent Beck is trying them out, so I have to be open to a change of opinion.

Here's something that happened, and is typical.

I told it to create a command to create a user. I told it the command must be idempotent.

It output this:

# This command is idempotent in the sense that,
# if the user exists, it will fail gracefully:
return server.shell(
    name=f"Add user {username}",
    commands=[
        f"{register_cmd} || echo 'User might already exist or command failed'"
    ],
)

The above uses pyinfra, an awesome Python library that obsoletes infrastructure-as-code solutions such as Ansible.

Margaret Boden divides creativity in 3 general types: combinatorial, exploratory and transformative. I notice that the output here shows combinatorial creativity coming from the AI. It combined its repertoire of shell usage with some Python in an attempt to satisfy my idempotency requirement. The solution is wrong, but I think it is mildly creative.

Notice the comment: "it will fail gracefully". That is only a half-truth. The whole truth is, the or converts every kind of failure into a success:

  • The user already exists? 'User might already exist or command failed'
  • The command is not present in the target machine? 'User might already exist or command failed'
  • Could not connect to the database? 'User might already exist or command failed'
  • Out of memory? 'User might already exist or command failed'
  • (...)

I used a model that doesn't search the web, and the training data probably didn't have much about pyinfra. Certainly unaware of how the command works, it tried to ensure idempotency through a trick.

So I had to look up the documentation and commit a correction:

    commands=[f"{register_cmd} --exists-ok"],

The above adds an argument that the command already has, that makes it idempotent.

So what a human really wants from a pair programming partner here is:

  • for the driver to ask the helper (me), "can you please look up the arguments to this function?"
  • instead of vomiting an inadequate solution and not even warning me.

All current LLMs (2026-03) have this bad behavior – they try to finish the project right now instead of interviewing me – even if I prompt them to not allow any assumptions to go unchecked.

There's a second problem: pyinfra will, by default, not show either the output of the command or the echo message, it will hide them. The LLM here really knows nothing about pyinfra. For brevity, I omit the solution to this.

Now examine all my steps:

  1. Create my augmented code setup,
  2. Write a prompt,
  3. Pay for an LLM that pollutes and is currently subsidized and will soon get 10-15x more expensive,
  4. Review the code, find and understand the problem,
  5. Look up the documentation,
  6. Write code,
  7. Commit it.

  8. Steps 0, 1 and 2 are the ones necessary to use an LLM.

  9. Steps 4, 5 and 6 are the steps it should have done for me.
  10. Step 3 would be unnecessary in an ideal world, but we'll probably never get there. Someone has to be responsible and sign off on the code, because LLMs cannot be trusted with any decisions.

If I had simply done steps 4, 5 and 6, in this case,

  • I would have finished much sooner.
  • I would have skipped 2 bugs in 6 lines of code or less.
  • I would have read a bit of the documentation, improving my memory of it and my ability to find it in the future.
  • I would have exercised my coding chops, which is as essential to building up my architecture ability and my software engineering ability as practicing the piano is to improving my musical skills and interpretation skills.

Writing code is a creative and pleasant activity that teaches you, makes you grow. Reviewing code is a tiresome, difficult, arid activity that teaches you nothing, but also depends on you having written a lot of code in the past. Nobody can be taught to review code, they can only be taught how to code.

If I stop writing code and become only a reviewer, this will certainly hurt my skills. But that is exactly what working with a coding assistant is.

Code reviews also do not work, beyond a very low quantity of lines of code. I have explained this in the past.

I am seeing a lot of programmers say that working in this way is exhausting. I think my example shows how.

Now, a critic might point out that I didn't use one of the most expensive models, the ones that can search the web. They might have answered correctly. But if you are being honest, you know that is not the point here. This small example is good enough as a placeholder for the kinds of mistakes that LLMs commit, such as working around problems in the wrong layer of abstraction (as it kinda did here).

In the awesome lecture "The dangers of probably-working software", the main point is "if you don't know how it works, you don't know it works". Had I been lazy and not paid attention to the output, the problem would still be there.

That is a Github employee – you know, the guys selling you CoPilot – brilliantly explaining that, if you don't review everything CoPilot writes for you – if you don't understand the code in your project to the point that it is yours – you cannot trust that code.

An organization does not need code written, it needs code understood.

Much beyond what is demonstrated in this example, LLMs make implicit design decisions in their code, which are usually bad, and never discussed or made explicit to you in any way.

So using a coding assistant becomes a futile exercise in trying to control (specify in the prompt) every single design decision, or sometimes just running the prompt again until something good comes out. In any case, you become a control freak.

But senior programmers have an intuition about code, they get the design right sooner. This intuition is more than a little different from a conscious enumeration of all the design decisions involved.

Even if you see the AI's implicit design decisions hidden in the code, they are a great opportunity for laziness to make our lives more difficult. In this specific project, the AI came up with a directory structure for the deployment, and I said "fine". I only realized how horrible it was, and how difficult to navigate and understand it was, in the middle of the project, when a refactoring became necessary to fix the issue. This refactoring involved a lot of files, it was unpleasant and a bit risky. I really wish I had spent some time designing the directory structure myself.

Laziness is a very difficult thing for humans to suppress, and no matter how disciplined I am as a programmer, I know my laziness will eventually hurt things again.

In short, the more you give the reigns to the AI, the worse for your project.

Ah! One more thing. If you are pairing with a robot, you are probably not pairing with a human.

In conclusion, both from a philosophical/educational stance and from practical usage, I continue to believe using LLMs to produce code is hurtful to your project, to your real speed, to your wallet, to your organization, to our energy consumption, to the environment, to the economy, to your skills, to your memory, to your patience, to your team and social skills, to your loneliness and sense of alienation, and to software engineering as a whole.

This technology enables developers to, irresponsibly, quietly quit, producing tons of instantaneous legacy code that is impossible to maintain, while clueless managers push for more and more of the same, thinking it's the New True Way.

It remains theoretically possible to code with an assistant in such a way that you gain more than you lose, but in practice, it's so hard to see how, and I think it requires such a senior and disciplined programmer to figure that out, that it's clearly not worth it.

But I am just an average schmuck. I invite you to hear a successful programmer/entrepreneur/philosopher small genius saying similar things: The dangerous illusion of AI coding.