Seems like if the different heads do not share weights then “the parameters in f1” is perfectly well-defined?
It seemed to me like you were using it in a way such that f1 shared no weights with f2, which I think was because you were confused by the quantification, like you said previously. I think we’re on the same page now.
Okay, so iiuc you’re relying on an assumption (fact? desire?) that the world model will never produce deduced statements that distinguish between f+ and f−?
Sorry, I was unclear about this in my last response.f+ and f− will only agree in cases where the human understands what’s happening. In the dataset version, we get that by collecting a dataset where we think the human always gets it right, whereas in the dataset-less version, we get that by including the H_understands check which ensures that we don’t have to satisfy the condition when the human would get the question wrong.
It seems like this should be lower complexity than the intended result, since True has much lower complexity than H_understands?
It seemed to me like you were using it in a way such that f1 shared no weights with f2
I mean, I would still have said this because I interpret a “head” f1 as “the part after the shared layers”, but I’m also happy to instead treat f1 as the entire function X×Q→A for which the first head forms part of the implementation.
Yes—at least that’s the assumption I’m working under.
It seems like this should be lower complexity than the intended result, since True has much lower complexity than H_understands?
I agree that the θ1 you’ve described has lower complexity than the intended θ1—but the θ2 in this case has higher complexity, since θ2 is no longer getting any of its complexity for free from conditioning on the f? condition. And in fact what you’ve just described is precisely the unintended model—what I call M−—that I’m trying to compete against, with the hope being that the savings that M+ gives you in θ2 are sufficient to compensate for the loss in having to specify f+ and H_understands in θ1.
If we calculate the complexity of your proposal, we get
complexity(M−)=complexity(θ−1)+complexity(θ−2|M−|f?)=complexity(W−H)+complexity(f−)+complexity(H|True)=complexity(W−H)+complexity(f−)+complexity(H)≈complexity(W)
whereas, if we calculate the complexity of the intended M+, we get
complexity(M+)=complexity(θ+1)+complexity(θ+2|M+|f?)=complexity(W−H)+complexity(f−)+complexity(f+)+complexity(H_understands)+complexity(H|H_understands→f+=f−)≈complexity(W−H)+complexity(f+)+complexity(H_understands)+complexity(H)−minθ2{complexity(θ2)|H_understandsH=θ2→f+H=θ2=f−H=θ2}≈complexity(W)+complexity(f+)+complexity(H_understands)−minθ2{complexity(θ2)|H_understandsH=θ2→f+H=θ2=f−H=θ2}
such that you can see that the question of which one wins is precisely dependent on whether the savings from conditioning on H_understands→f+=f− offsets the cost of having to specify f+ and H_understands.
such that you can see that the question of which one wins is precisely dependent on whether the savings from conditioning on H_understands→f+=f− offsets the cost of having to specify f+ and H_understands.
Yeah, that makes sense. I guess I don’t really see the intuition about why this should be true, but fair enough to leave that as an open question.
It seemed to me like you were using it in a way such that f1 shared no weights with f2, which I think was because you were confused by the quantification, like you said previously. I think we’re on the same page now.
Sorry, I was unclear about this in my last response.f+ and f− will only agree in cases where the human understands what’s happening. In the dataset version, we get that by collecting a dataset where we think the human always gets it right, whereas in the dataset-less version, we get that by including the H_understands check which ensures that we don’t have to satisfy the condition when the human would get the question wrong.
I think I might be missing a change you made to the algorithm. Can θ1 write an arbitrary program for f?? In that case, what prevents you from getting
It seems like this should be lower complexity than the intended result, since
True
has much lower complexity thanH_understands
?I mean, I would still have said this because I interpret a “head” f1 as “the part after the shared layers”, but I’m also happy to instead treat f1 as the entire function X×Q→A for which the first head forms part of the implementation.
Yes—at least that’s the assumption I’m working under.
I agree that the θ1 you’ve described has lower complexity than the intended θ1—but the θ2 in this case has higher complexity, since θ2 is no longer getting any of its complexity for free from conditioning on the f? condition. And in fact what you’ve just described is precisely the unintended model—what I call M−—that I’m trying to compete against, with the hope being that the savings that M+ gives you in θ2 are sufficient to compensate for the loss in having to specify f+ and
H_understands
in θ1.If we calculate the complexity of your proposal, we get complexity(M−)=complexity(θ−1)+complexity(θ−2 | M−|f?)=complexity(W−H)+complexity(f−)+complexity(H | True)=complexity(W−H)+complexity(f−)+complexity(H)≈complexity(W) whereas, if we calculate the complexity of the intended M+, we get complexity(M+)=complexity(θ+1)+complexity(θ+2 | M+|f?)=complexity(W−H)+complexity(f−)+complexity(f+)+complexity(H_understands)+complexity(H | H_understands→f+=f−)≈complexity(W−H)+complexity(f+)+complexity(H_understands)+complexity(H)−minθ2{complexity(θ2) | H_understandsH=θ2→f+H=θ2=f−H=θ2}≈complexity(W)+complexity(f+)+complexity(H_understands)−minθ2{complexity(θ2) | H_understandsH=θ2→f+H=θ2=f−H=θ2} such that you can see that the question of which one wins is precisely dependent on whether the savings from conditioning on H_understands→f+=f− offsets the cost of having to specify f+ and H_understands.
Yeah, that makes sense. I guess I don’t really see the intuition about why this should be true, but fair enough to leave that as an open question.