This post was written at 2am by a sleep-deprived human

Lipstick on a pig

As humans, we have evolved to be very good at orienting ourselves in nature. A landscape, with mountains, rivers, and valleys is natural and we have a good memory for geographical references. And the same thing also happens with code - or so says Robert C. Martin. Therefore, our natural instinct is to remember the curvature of the code indentation, the syntax coloring in the rivers of characters, and the beautiful valleys between the scope blocks. This - according to Clean Coders - is not how we should be navigating code. Rather than relying on our natural hunter and gatherer instincts, we should name our functions and variables with meaningful names and keep them short - so short they should do one thing only. This way, I don’t have to memorize the mountains and valleys of the code and anyone joining the team can find their way through the code by simply reading the code. It’s like prose, not cartography.

When I got my first job as a Software Engineer over 10 years ago I was given a copy of Clean Code and Clean Coder by Robert C. Martin. I’ve read the books, watched the videos, attended trainings, and did TDD katas in my spare time. For the next couple of years I went around the corridors boasting about my professionalism for always writing tests and keeping functions to 10 lines at maximum. After a few years in the industry and reading code from many different codebases and programming languages, my view of the world became less black-and-white. You start reading different authors, get out of the bubble you’re in, and all of a sudden the absolutism starts falling apart. It’s like when you are told as junior “Do not repeat yourself!” but then after some years you realize that repeating yourself is sometimes fine - it’s certainly better than the wrong abstraction! Some of the best code I read had a lot of comments and sometimes a few hundred lines of code in a single function. The number of lines - turns out - wasn’t the problem. Mixing different concepts and abstractions layers was the real culprit!

The last couple of years I’ve developed a sense of taste. What good code looks like. You see many different shapes of functions and structs. Within a few minutes - I can tell how I feel about that code. I don’t know if it’s good or bad. If it solves the problem or not. But I can definitely tell if it matches or not my quality pattern. That doesn’t mean that the code that wrinkles my nose will prove to be bad, but the probability - after years of pattern recognition - is higher. I can’t help but think in mountains, rivers, and valleys. I know what a nice valley looks like compared to a swamp. I’ve learned, after years of trekking, what a good trail is. And if I am settling somewhere, I hope that I can identify fertile land when I see it.

It’s entirely subject to taste - but looking at the outline of these two pieces of code - I know which one I am going into with a higher expectation.

It’s 20 degrees Celsius at 9am and I’m biking to the office. It’s only an 8-minute ride. My company just opened a new office close to where I live. I haven’t worked in an office in a while. I have a pretty good setup at home - but after a while it’s nice to change the scenery. I get up the elevator. Pass the reception. Set up at my desk. Grab a cup of coffee. I’m ready to start my day. A couple new messages in my team’s slack channel. There is a pull request I need to review. It’s a pretty important change I was excited for. It’s not a large change, but it was thoroughly discussed before. We’ve had an RFC, a few discussions, gathered feedback, tested a few hypotheses, and finally landed on a good path forward.

I open the pull request - it’s shy of a thousand lines of code. Not all of it is actual code. There are some config files, some generated things, and a few test data files. It’s an end-to-end feature. Starts at the server and makes changes all the way down to our persistency layer. Nothing fundamental should change - but it’s still a highly impactful contribution. Let’s dig in!

We are adding a new endpoint to the API. Sure, that’s exactly what I expected.

Scroll scroll scroll

Ah, here is the core of it. Skimming through it looks good. I see there are a few comments explaining the change. That looks nice. Seems like the code is pretty polished already! That’s a lot of em-dashes though. But that’s fine. Some people like to use them — I do! Haven’t read the comments thoroughly yet, this is just a first pass. But skimming through these they seem to explain what is happening here.

Scroll scroll

There is a lot of logic here, I wonder why they implemented so much stuff. There is surely a reason for it. Seems like we underestimated how much code we would need to get this feature in. I see there are docstrings in all the public members of this module – nice! The functions aren’t too long either, and the few structs, generics, and interfaces I’ve seen look correct.

Scroll scroll

I see some tests. Yea that looks sensible. There a lot of tests though. And each test is doing mostly the same thing. Surely this could’ve been a helper function? I’ll throw in a comment. But I should look at the tests later - let’s focus on the structure first.

Scroll …. Scroll

Wait… What?

fn test_can_parse_special_characters_in_html_tags() {
    // ...
}

Can parse special characters in HTML tags? That doesn’t seem like something we would ever need to test? Sure we are implementing a feature that is somewhat adjacent to HTML server-side rendering, but surely we don’t need to validate any parsing? Maybe I should put a comment that this test seems excessive. Or is it? Could it be that there is a parser implemented here? No, it makes zero sense. Unless…

Scroll scroll scroll scroll

struct PartialHTMLParser { /* ... */}

A… partial HTML parser? Ok now I’m really confused. This change was supposed to focus on adding a feature that allows users to generate HTML reports. Sure we have to handle custom HTML templates - but I doubt that we would ever need to roll out our own parser unless we are doing something really fishy? I don’t remember that in the RFCs or in the design. There must be a reason. Also, if they’ve put in all this effort to roll out their own parser, surely it must be needed, no?

The code does look neat. It looks like a parser. I did write a few parsers before - mostly in toy projects. They all look similar at the core. It reminds me a bit of the example code from Writing an Interpreter in Go book I read a few years ago. I never really finished the book, but the first few chapters were great. It’s still sitting on my bookshelf, one day I’ll go back to it and implement the whole thing in Clojure.

Alright, let’s see where this parser is used

Find in page “PartialHTMLParser”

Huh, it’s used in the nice code I skimmed before. It has quite a few comments - none seem to explain why the HTML parser is needed. It acknowledges its existence and says how it parses, but not why. The code looks nice though. It is fully tested as well and the functions are well divided. I can see the mountains and valleys clearly and the function names fits nicely with the code inside.

It seems like the parser is used to validate the templates so we can produce better errors. Turns out that our template rendering library also produces error messages. Just slightly differently. They also catch the same errors that our parser does. After reading the docs, I think there might be a bug with our parser actually.

Scroll scroll scroll

Back to the parser. Ok, this will take me some time. My coffee mug is now empty. Should probably get some water rather than a second coffee. I still need to work on my task. But I can’t in good conscience approve this without reading through this parser code. What if there is a major bug in it? Ok, excluding tests it’s around 200 lines of code. Let’s muscle through this, shall we?

After a few minutes, a few scribbles on my notebook, I think I understand what this is doing. At first glance it looks right, but the code is just confusing. There are so many questionable things, but nothing is immediately wrong. The edge case I was sure it would fail has a test covering it - and it does pass! It doesn’t repeat any code, but it handles things slightly differently in every function. Some functions take a context argument, some create a new context, some just ignore the context entirely! There are some arbitrary validations, but they are sparse and sometimes inconsistent.

The more you look at it, the weirder it gets. After adjusting the monitor, the chair, and rubbing my eyes I finally see it.

It’s lipstick on a pig.

I’ve been swinging both ways on the LLM pendulum and whilst I thought these things can actually be useful I’ve also swung back to hating everything about them. Like every change to your comfort zone there is a period of resistance. I just have to fight my bias and give them a go. If everyone is using it, it must be good. Right?

In the last six months I assumed the persona of LLM hater in my team - and as much as we all laugh, call it clankers, and they know I am not a robot-nazi - I agree with a lot of the criticism they get. I don’t forbid my team from using it - that would be a speed run of getting fired in 2026. But I often go on rants about something the clanker did massively wrong.

And by wrong - I don’t mean non-functional. The proposed code, most of the time, works. And it looks correct. Just working, was certainly enough before LLMs. But it is enough only if we know why and how it works.

The crux of this issue, in my opinion, is something that I’ve been calling knowledge debt. This is similar to technical debt and (shockingly) real debt. Like when we take out a loan in the bank, technical debt also has to be paid off, often with draconian interest rates. The longer it takes for you to pay off your technical debts, the higher the effort cost you will have to pay. Maybe the hacky public API you added is now being depended on by another team. That cron job that runs every hour to reconcile data is conflicting with another feature that was introduced last week. They had to make some suboptimal design decisions on the new feature because of that forsaken cronjob. Codebases, like your startup out of a garage, can go bankrupt if you don’t pay your debts.

The same is now true for knowledge debt. There’s no shortage of (very reliable) accounts on reddit of a post-LLM world, where vibe coded projects are just thrown out rather than updated because it’s easier to start from scratch than to unslop the slop. The danger of knowledge debt is accepting code that no one in the team really knows why it’s needed. It has to be maintained now. Do we really need an HTML parser? Maybe. But if no one can answer why we need it then either it goes away or we need to go back to the drawing board.

I keep scrolling through the thing and I just can’t wrap my head around the parser. It’s almost 10am and we have our daily standup. Maybe I should just ask the team what they think. I also have other things I need to review and some code I need to write for my own stuff. I’ve used claude code to think for me while I review the PR. Let’s see what it says.

As I scroll through its thoughtful plan - it looks correct. The keywords are right. It certainly is saying what I want to hear. Does it know that’s what I wanted to hear? I did give it a pretty clear instruction of what I wanted and I can see the same words are here.

Anyway, let’s go back to the Pull Request. I see they’ve also enabled an automated code review tool. There are a lot of comments from the LLM review. That’s good… I hope? They are all resolved now, so it seems like they worked through the changes with the LLM before submitting for review. Let’s see what it wrote.

Scroll scroll

Ah the infamous parser. There are tons of comments on the parser. Good! Maybe the LLM caught the same things I did?

Expand comments

Ok… It’s being overly pedantic on the parser logic. It clearly over-indexed on this potential remote code execution attack vector. It says it’s severe. A security issue, it said. Sounds… odd?

Let’s see where this code path is actually used.

Scroll scroll scroll

So the HTML templates are loaded from configs. These configs are only provided by… out own team I think? Yea that’s the case. So this issue is to prevent us from accidentally injecting a remote code execution exploit on… ourselves? Ok… sure. More security is good, right?

This all seems to be founded on the fact that you can have script tags in the template. Let’s see what the library says about it.

Types on the search bar. Scroll scroll. Click. Scroll scroll scroll

The library we are using for rendering the HTML templates apparently can be configured to strip out any JavaScript from the final output. We will also render these templates into PDFs eventually. Is this really a severe issue?

It all seems a bit excessive, the parser, the comments, the review. Are we focusing on the right things here?

I see there are a few code comments on the parsing function referencing the vulnerability issue. These comments look nice. They form a nice symmetric light gray block on top of the colorful 15 line Rust function. It looks tidy. It’s nice. I recognize this pattern. I’ve seen this valley before. The content of the comments are a bit concerning though. It does mention the attack vector - but it references a potential attack vector on the inputs to the template rather than on the template itself. But on closer inspection, the code is validation for potential injections is only for the template. It seems like the comments - albeit beautifully written - are wrong.

The comments also mention CSTs. They stand for something entirely specific to our domain. We do take CSTs as a parameter in this function, so it makes sense to mention it. However, the comment also states CST (Concrete Syntax Tree), which is definitely not our CST. The function is related to parsing though, just nothing to do with Syntax Trees. I’m just going to ignore this one for now and continue looking at the parsing logic. But this whole change has left me now with a bad taste in my mouth.

It’s almost like these are just words. Seemingly arranged in a way that they look correct but aren’t.

I’ve written in the past about solving the same problem multiple times. It’s a good practice I’ve picked up in a previous employer. I enjoy experimenting with different approaches, and throwing the first solution away in favour of a second, more experienced attempt at solving the same thing. Even at the expense of time, I’ve felt that this practice has - more often than not - yielded good results.

There is a scale to which we put effort on things. I enjoy taking the time to make my contributions polished. Thinking through the properties of the change, how it affects the code around it, the obvious optimization improvements, and eventually how neatly organized my blocks of code are. In that order. Often when I’m solving the same problem for the second time, I’m only iterating up to the first and second aspect of this list. It’s just brush strokes by then.

Sometimes when I open the Pull Request - the code is still in a rudimentary state. I still have some TODO comments in the code. Some functions are going out without comments or docstrings. I’ve left the hacky debug functions and my tests definitely need some love. At this point I am mostly looking for validation on the major direction of the change.

I can’t help but do the same thing when reviewing code. I rarely jump into optimizations or even the tests on the first pass. First, I focus on what is being proposed; if this fits the system’s properties; and if the abstraction is right. A second pass would probably yield comments about the choice of names, tests, code style, and the lack (or excess) of comments.

It’s a natural progression, going from an idea, to a draft, iterate on the fundamentals, lay the foundation, and only then polish it.

What I’ve been noticing with heavily LLM-generated code is that the first draft already looks extremely polished. The number of comments, code style, and tests look like something that has already seen a few iterations. It’s incredibly polished and you almost feel like a dick by adding so many comments to it. But you can’t help yourself since, upon closer inspection, you notice that they’ve made a perfectly flat square wheel.

But hey, it looks amazing!

It’s 10am. Time for the daily standup. I take my laptop to a soundproof booth. I am going to the office often now but my team is still remote. But it’s an 8 minute bike ride, so it’s fine. We join the sync. The team is pretty self-sufficient. They don’t need me to moderate the meeting. We do some small talk and joke around in the first 5 minutes then we kick off.

Someone shares the task board and each person goes through their tickets - giving a status update and often a few complaints. “Infra sucks” I can see a few nods from people that were either paying attention or realized that it seemed like an affirmation their muscle memory triggered the nodding function while they were typing something in slack or reading hacker news.

People are implementing things. Everyone is pretty busy. There are at least 3-4 tickets assigned to each one. I haven’t assigned anything to anyone, they pick up their own work packages based on the overall project/epic they are working on.

I ask a couple of questions. Somebody else makes a suggestion. A few other questions.

We start talking about the API feature. It’s marked as “In review”. We hope to merge it today! It’s a much needed feature. We’ve been asked a few times when that would be available.

I have some questions about that parser. Do we really need it? I’ve left some comments in the PR but I’m struggling to understand why can’t we just rely on the templating library?

Yea we don’t really need it. But it’s nice to add an additional validation. Plus it’s just a few hundred lines of code. We get much better error validation this way

Collective nods.

We talk a bit more about it. Some back and forth. They agree that it’s probably better to remove it. No attachments really. That’s quite professional. I try not to be attached to the code that I write. If we need to remove it we remove it. If I’ve hand-rolled a parser in a PR I think at least some sunken cost fallacy feelings would appear. I would probably make my case for why it should be there. I try not be attached. But it’s a craft, of sorts. We build our own mountains, rivers and valleys and we grow attached to it. “I’ve spent time on this” - we say to ourselves. “It’s my baby!”

I go back to my desk. Claude seems to be done thinking. Let’s see what he thinks about my plan for the new feature.

Scroll scroll

A lot of text. Not reading all of that. This is all generated anyway so the value of reading must be smaller, right? The keywords are right. Claude came up with a 17 step plan to implement this. Sounds like a lot. But hey - it’s going to do all that work for me! Let’s skim through this.

Scroll scroll scroll

I see my mountains (functions). There are some nice rivers (expressions). I know this pattern. It looks familiar. It must be right.

Scroll scroll

Step 15 seems incorrect. I don’t understand why we need to implement this very complicated validation. Do we really need this? It sounds excessive. It has two paragraphs explaining it, but I’m not sure I understand why. Let me ask Claude to remove it from the plan.

I ask.

I was absolutely right, apparently - it was not needed! What a great feeling!

After a few clicks, prompts, and scrolls, the plan looks solid. Yea that seems right. 16 steps. All neatly arranged.

Implement the plan!

Colorful letters scroll past the terminal faster than anyone could read them. “Accept all changes” - I’ve said. More letters flash through the screen. Seems like an endless cycle. They are all familiar though. Their shape and color and flow are comfortable and natural. I can make sense that what is coming out of it looks correct. More and more letters go through my screen. New files are created. A few files are being changed.

I lock my computer. Time for another coffee.

It’s a good day. It’s getting a bit warmer now. 25 degrees. There is a big balcony on the new office. It offers a nice view of the city. The coffee is okay. I’ve talked with some other folks at the office that we should get an espresso machine. Summer is almost here. It’s going to get pretty hot. Maybe biking to the office won’t be a good idea when it’s 30 degrees outside. Well, we’ll see.

I head back to my desk.

Claude is done. All 16 steps. Finished. I have my feature! And I still have some coffee left in my mug.

I open the delta. Let’s take a look!

Scroll scroll scroll

Yea that all makes sense. I can make sense of the mountains, rivers, and valleys.

Scroll scroll

Wait a minute. There is one particular mountain that looks odd. I zoom in. Deep in the trees there is something red and pink. That doesn’t seem right, does it? These are trees, it should be mostly green and brown. I zoom in a little more. And there it is. Hiding between the trees. It’s a pig. It has lipstick on.

⇦ Back Home | ⇧ Top |

If you hated this post, and can't keep it to yourself, consider sending me an e-mail at fred.rbittencourt@gmail.com. I'm more responsive to positive comments though.