There is one catch: in principle, there could be multiple codes/descriptions which decode to the same message. The obvious thing to do is then to add up the implied probabilities of each description which produces the same message. That indeed works great. However, it turns out that just taking the minimum description length—i.e. the length of the shortest code/description which produces the message—is a good-enough approximation of that sum in one particularly powerful class of codes: universal Turing machines.
Is this about K-complexity is silly; use cross-entropy instead?
Yes.