evhub comments on Answering questions honestly instead of predicting human answers: lots of problems and some solutions

evhub 4 Aug 2021 22:30 UTC
LW: 4 AF: 4
AF

Can $θ_{1}$ write an arbitrary program for $f_{?}$ ?

Yes—at least that’s the assumption I’m working under.

It seems like this should be lower complexity than the intended result, since True has much lower complexity than H_understands?

I agree that the $θ_{1}$ you’ve described has lower complexity than the intended $θ_{1}$ —but the $θ_{2}$ in this case has higher complexity, since $θ_{2}$ is no longer getting any of its complexity for free from conditioning on the $f_{?}$ condition. And in fact what you’ve just described is precisely the unintended model—what I call $M^{-}$ —that I’m trying to compete against, with the hope being that the savings that $M^{+}$ gives you in $θ_{2}$ are sufficient to compensate for the loss in having to specify $f^{+}$ and H_understands in $θ_{1}$ .

If we calculate the complexity of your proposal, we get $\begin{matrix} complexity (M^{-}) = complexity (θ_{1}^{-}) + complexity (θ_{2}^{-} | M^{-} |_{f_{?}}) = complexity (W - H) + complexity (f^{-}) + complexity (H | True) = complexity (W - H) + complexity (f^{-}) + complexity (H) \approx complexity (W) \end{matrix}$ whereas, if we calculate the complexity of the intended $M^{+}$ , we get $complexity(M+)=complexity(θ+1)+complexity(θ+2 | M+|f?)=complexity(W−H)+complexity(f−)+complexity(f+)+complexity(H_understands)+complexity(H | H_understands→f+=f−)≈complexity(W−H)+complexity(f+)+complexity(H_understands)+complexity(H)−minθ2{complexity(θ2) | H_understandsH=θ2→f+H=θ2=f−H=θ2}≈complexity(W)+complexity(f+)+complexity(H_understands)−minθ2{complexity(θ2) | H_understandsH=θ2→f+H=θ2=f−H=θ2}$ such that you can see that the question of which one wins is precisely dependent on whether the savings from conditioning on $H_understands \to f^{+} = f^{-}$ offsets the cost of having to specify $f^{+}$ and $H_understands$ .
- Rohin Shah 5 Aug 2021 8:33 UTC
  LW: 4 AF: 4
  AF Parent
  such that you can see that the question of which one wins is precisely dependent on whether the savings from conditioning on $H_understands \to f^{+} = f^{-}$ offsets the cost of having to specify $f^{+}$ and $H_understands$ .
  Yeah, that makes sense. I guess I don’t really see the intuition about why this should be true, but fair enough to leave that as an open question.