For example, the way scientific experiments work, your p-value either passes the (arbitrary) threshold, or it doesn’t, so you either reject the null, or fail to reject the null, a binary outcome.
Ritualistic hypothesis testing with significance thresholds is mostly used in the social sciences, psychology and medicine and not so much in the hard sciences (although arbitrary thresholds like 5 sigma are used in physics to claim the discovery of new elementary particles they rarely show up in physics papers). Since it requires deliberate effort to get into the mindset of the null ritual I don’t think that technical and scientific-minded people just start thinking like this.
I think that the simple explanation that the effect of improving code quality is harder to measure and communicate to management is sufficient to explain your observations. To get evidence one way or another, we could also look at what people do when the incentives are changed. I think that few people are more likely to make small performance improvements than improve code quality in personal projects.
Ritualistic hypothesis testing with significance thresholds is mostly used in the social sciences, psychology and medicine and not so much in the hard sciences (although arbitrary thresholds like 5 sigma are used in physics to claim the discovery of new elementary particles they rarely show up in physics papers). Since it requires deliberate effort to get into the mindset of the null ritual I don’t think that technical and scientific-minded people just start thinking like this.
I think that the simple explanation that the effect of improving code quality is harder to measure and communicate to management is sufficient to explain your observations. To get evidence one way or another, we could also look at what people do when the incentives are changed. I think that few people are more likely to make small performance improvements than improve code quality in personal projects.