We generally assume that we can construct questions sufficiently well that there’s only one unambiguous interpretation. We also generally assume that the predictor “knows” which world it’s in because it can predict how humans would respond to hypothetical questions about various situations involving diamonds and sensors and that humans would say in theory Q1 and Q2 could be different.
More concretely, our standard for judging proposals is exhibiting an unambiguous failure. If it was plausible you asked the wrong question, or the AI didn’t know what you meant by the question, then the failure exhibited would be ambiguous. If humans are unable to clarify between two possible interpretations of their question, then the failure would be ambiguous.
>We also generally assume that the predictor “knows” which world it’s in because it can predict how humans would respond to hypothetical questions about various situations
This seems like it doesn’t disambiguate between the conditions assumed in a question being true, vs. just the human believing them. E.g. the predictor could predict that when asked “The camera is hacked so it looks like this [camera feeds making it seem like the diamond is still there], and the diamond is in the robber’s pocket; is the diamond really in the room?”, the human will answer “No!”, not by understanding that by “diamond really in the room” the human means that the diamond is really in the room, but rather just by modeling the human as believing the premise of the question (that the diamond is in the pocket).
Edit:
To elaborate, this condition on counterexamples is given in the ELK document:
“The model understands the question. One sufficient condition is that the model can predict human answers to essentially arbitrary hypothetical questions in order to clarify the meaning of terms.”
I basically don’t see how this condition constrains anything about the predictor. It seems like all it really says is that the predictor knows how humans talk. I don’t see how it can be specifying that the AI’s beliefs about how humans answer questions are related to reality, other than in the training set, where we assume that the human talk matches reality. I don’t see how it makes sense to think of this as the model “understanding the question”. Normally I’d think of “understanding the question” as meaning “can have the same question”. To have a question, you have a role that an answer could fulfill. But if the predictor is organized e.g. as a giant low-level Bayes net, then there’s no role that could be filled by an answer to “where’s the diamond”. There might be a role for an answer to “where’s the diamond”, induced by how the rest of the AI makes use of the predictor, but that seems contingent and anyway it’s not about the predictor (I think ELK is supposed to make sense just with the predictor?).
We generally assume that we can construct questions sufficiently well that there’s only one unambiguous interpretation. We also generally assume that the predictor “knows” which world it’s in because it can predict how humans would respond to hypothetical questions about various situations involving diamonds and sensors and that humans would say in theory Q1 and Q2 could be different.
More concretely, our standard for judging proposals is exhibiting an unambiguous failure. If it was plausible you asked the wrong question, or the AI didn’t know what you meant by the question, then the failure exhibited would be ambiguous. If humans are unable to clarify between two possible interpretations of their question, then the failure would be ambiguous.
>We also generally assume that the predictor “knows” which world it’s in because it can predict how humans would respond to hypothetical questions about various situations
This seems like it doesn’t disambiguate between the conditions assumed in a question being true, vs. just the human believing them. E.g. the predictor could predict that when asked “The camera is hacked so it looks like this [camera feeds making it seem like the diamond is still there], and the diamond is in the robber’s pocket; is the diamond really in the room?”, the human will answer “No!”, not by understanding that by “diamond really in the room” the human means that the diamond is really in the room, but rather just by modeling the human as believing the premise of the question (that the diamond is in the pocket).
Edit:
To elaborate, this condition on counterexamples is given in the ELK document:
“The model understands the question. One sufficient condition is that the model can predict human answers to essentially arbitrary hypothetical questions in order to clarify the meaning of terms.”
I basically don’t see how this condition constrains anything about the predictor. It seems like all it really says is that the predictor knows how humans talk. I don’t see how it can be specifying that the AI’s beliefs about how humans answer questions are related to reality, other than in the training set, where we assume that the human talk matches reality. I don’t see how it makes sense to think of this as the model “understanding the question”. Normally I’d think of “understanding the question” as meaning “can have the same question”. To have a question, you have a role that an answer could fulfill. But if the predictor is organized e.g. as a giant low-level Bayes net, then there’s no role that could be filled by an answer to “where’s the diamond”. There might be a role for an answer to “where’s the diamond”, induced by how the rest of the AI makes use of the predictor, but that seems contingent and anyway it’s not about the predictor (I think ELK is supposed to make sense just with the predictor?).