Potentially challenging example: let’s say there’s a server that’s bottlenecked on some poorly optimized algorithm, and you optimize it to be less redundant, freeing resources that immediately gets used for a wide range of unknown but presumably good tasks.
Superficially, this seems like an optimization that increased the description length. I believe the way this is solved in the OP is that the distributions are constructed in order to assign an extra long description length to undesirable states, even if these undesirable states are naturally simpler and more homogenous than the desirable ones.
I am quite suspicious that this risks you end up with improper probability distributions. Maybe that’s OK.
(Not sure if by “runtime” you mean “time spent running” or “memory/program state during the running time” (or something else)? I was imagining memory/program state in mine, though that is arguably a simplification since the ultimate goal is probably something to do with the business.)