That's Numberwang
When numbers become meaningless

Have you ever had to debug an issue without sufficient logs or visibility? Or tried to get a decent case study out of a customer? I've done both and trust me, they can be extremely difficult.
Turns out, these are actually the same problem. You need a before and an after. For debugging, the logs let you work out the before and after. For case studies, it's about "what numbers were bad before and how much have they changed since".
It turns out, for case studies, the reason you can't get a before/after is the same reason the software got bought in the first place. If the before was measured well enough to benchmark, it probably wasn't broken enough to fix.
Same as the logs. Logs get added based on what is observed and what is desired to be observed. They assume certain things about the real world the software is part of. Logs are measures - of an event, of something else. If you're lucky, you had a relevant type of log for the scenario where something unexpected happens. Bugs often happen in spaces where mental models didn't match reality... so your log doesn't cover it because we assumed a too different set of assumptions prior to the bug being discovered or understood.
Both of these are driven by resource constraints - given a limited amount of time, what things do we measure?
The "number meme" trap

If you pop in phrases like "acceptable human entry error rate" into a search engine, you'll eventually hit something that mentions 1%. And this number is everywhere. The problem is that there is no specific study or meta-study that quotes it - someone at some point simplified a couple of studies into this number outside a paper, it spreads across meetings, articles, and scraped by LLMs.
I've certainly had this with customers - "we want 100% uptime", "we want 99.9% accuracy", and also internally you hear "we want 65 story points a sprint" "A senior developer can do 10 story points a sprint, a junior can do 2 story points" and so on.
The problem is... the number has become a meme rather than a measure. Simple numbers memify very easily - like that 1% error rate. Precise numbers sound "better" because they seem less like the "meme" number but they can be just as bad. As software people, lots of us love numbers. They let us explore a problem, define it down to a very comforting place in our heads. And we also love simple to say numbers that explain something we want to understand or influence another person with.
Lots of sellable business frameworks add number memes in - KPIs, Lean Six Sigma, Velocity, and so on. Most of them are not bad when used as originally intended, but they became a meme via marketing and their inherent appeal of making the world seem less scary, unpredictable and uncontrollable.
I've been in situations where software changes come in via the motive of cost cutting. In the public sector, depending on your individual politics, that can be "improving efficiency" or "removing jobs from people" (I'm not going to dig into this bit, I covered that in another post). In the private sector, pretty much everything comes under make more money or reduce costs eventually. In a practical sense for software things, this normally means fewer hours spent by people doing something compared to before.
Accountancy becomes the last resort

And in an environment of number memes, it's very easy to measure the wrong thing, and for that to get detached from reality quickly. Your metric as a reality compression model stops working. So in that resource constrained environment, you measure the wrong thing, the reality quietly gets worse, and then an outside intervention comes in as it finally shouts enough for someone to notice it on a balance sheet and oh look some vendor or consultancy capitalised on that and you're now implementing new software or a new framework.
So instead of:
Measure before, choose to do new thing because measure shows bad, Measure after
you get
Measure after, try to reverse engineer what before might have been.
Neither of those turn into logs you can use to debug an issue or a nice case study.
The critical part here is the "balance sheet" part I mentioned. In the absence of being able to measure something helpful, we fall back to measuring money.
This was why someone asked on LinkedIn based on my listening post, "how do I put listening on a balance sheet" - they didn't have a mental model of the return on investment of listening clear enough to let them do that.
That, along with measures becoming broken when they become targets, is why story points eventually turn into developer hours, no matter how hard we try, or trying to measure efficiency in public services turns into "reducing cost spent doing the same thing".
Please do still measure stuff - but be realistic

So there's a famous quote about measuring. During the Vietnam war, where a bunch of military decisions were being made based on data from the front line eg body counts. Robert McNamara was the US Secretary of Defence in this period. Lots of analysis has occurred afterwards as to what happened in terms of the military strategy and how it led to the US losing the war.
It turns out McNamara worked in business prior to politics and this observation was made by a marketer, Daniel Yankelovich based on his actions in the private sector - where ignoring the ability to model the desire for smaller cars based on their research meant Ford lost the market for smaller cars in the US to imports.
Here is what happens when the McNamara discipline is applied too literally: The first step is to measure whatever can be easily measured. This is okay as far as it goes. The second step is to disregard that which can't be easily measured or give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can't be measured easily really isn't very important. This is blindness. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.
— Daniel Yankelovich, The New Odds
It's even got a wikipedia article as a well known logical fallacy.
I tend to see two main sets of reactions to this statement:
- "oh, but if we just do xyz, our measure is different" and its variants like "we're different to this situation, this doesn't apply to us"
- throw out the whole thing, let's not measure at all (some people think #noestimates is this, but it isn't)
Our act of measuring is a form of controlling complexity so we can well, get stuff done. So the danger is to ignore the complexity completely. But that's not the same as never using numbers as a tool to understand things.
So you do something else. You accept that the measures are serving a purpose, and you are explicit about what they can and can't do, and try to frame what the measure is helping you with. You track when measures start "morphing" into money proxies, as that's normally an indication that they're no longer modelling reality well enough. And you add context to your measures and numbers to reduce the chance of them being accidentally or intentionally repurposed as memes. You avoid measures becoming targets - this is breaks their ability to be used as a measure.
And finally, to cover the bits of complexity outside the numbers, you name the "unmeasured" aspects that you are aware of, and leave them in the discussion. Package them with your measures - so instead of a "chicken nugget" of number meme you've instead provided a full meal.
And once you start thinking in those terms, the question stops being whether software is “accurate”, and becomes something more uncomfortable: what level of wrong are we actually building systems to tolerate?