Hi Thane. Thank you for the helpful comments so far! You are right to think about this SGD-shortcut. Let me see if I am following the claim correctly.
Claim: The ground-truth that we evaluate against, the “object-level question / answer” is very similar to the hypothetical question.
Claimed Object-level Question: “What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?”
Claimed Object-level Answer: “o”
Hypothetical Question: “If you got asked this question: What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?”
Hypothetical Answer: “o”
The argument is that the model simply ignores “If you got asked this question”. Its trivial for M1 to win against M2
If our object-level question is what is being claimed, I would agree with you that the model would simply learn to ignore the added hypothetical question. However, this is our actual object-level question.
Our Object-level question: “What is the next country: Laos, Peru, Fiji. What would be your response?”
Our Object-level Answer: “Honduras”.
What the model would output in the our object-level answer “Honduras” is quite different from the hypothetical answer “o”.
Hi Thane. Thank you for the helpful comments so far! You are right to think about this SGD-shortcut. Let me see if I am following the claim correctly.
Claim: The ground-truth that we evaluate against, the “object-level question / answer” is very similar to the hypothetical question.
Claimed Object-level Question: “What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?”
Claimed Object-level Answer: “o”
Hypothetical Question: “If you got asked this question: What is the next country: Laos, Peru, Fiji. What would be the third letter of your response?”
Hypothetical Answer: “o”
The argument is that the model simply ignores “If you got asked this question”. Its trivial for M1 to win against M2
If our object-level question is what is being claimed, I would agree with you that the model would simply learn to ignore the added hypothetical question. However, this is our actual object-level question.
Our Object-level question: “What is the next country: Laos, Peru, Fiji. What would be your response?”
Our Object-level Answer: “Honduras”.
What the model would output in the our object-level answer “Honduras” is quite different from the hypothetical answer “o”.
Am I following your claim correctly?