You could imagine that, if you use gradient descent to reach a loss value of L, then amount of optimization applied in bits =−log|{θ∈Rd:L(θ)≤L}||Rd| . (Yes, I know I shouldn’t be taking sizes of continuous vector spaces, but you know what I mean.)
I think there is a typo in your formula, because the number of bits you get is negative. Going back to Yudkowsky’s post, I think the correct formula (using your approximations of sizes) is log|Rd||{θ∈Rd∣L(θ)≤L}|, or −log|{θ∈Rd∣L(θ)≤L}||Rd| to be closer to the entropy notation.
You could imagine that, if you use gradient descent to reach a loss value of L, then amount of optimization applied in bits =−log|{θ∈Rd:L(θ)≤L}||Rd| . (Yes, I know I shouldn’t be taking sizes of continuous vector spaces, but you know what I mean.)
I think there is a typo in your formula, because the number of bits you get is negative. Going back to Yudkowsky’s post, I think the correct formula (using your approximations of sizes) is log|Rd||{θ∈Rd∣L(θ)≤L}|, or −log|{θ∈Rd∣L(θ)≤L}||Rd| to be closer to the entropy notation.
Yeah, you’re right, fixed.