Me: Huh? The user doesn’t try to shut down the AI at all.
For people, at least, there is a strong correlation between “answers to ‘what would you do in situation X?’” and “what you actually do in situation X.” Similarly, we could also measure these correlations for language models so as to empirically quantify the strength of the critique you’re making. If there’s low correlation for relevant situations, then your critique is well-placed.
(There might be a lot of noise, depending on how finicky the replies are relative to the prompt.)
I agree that that’s a useful question to ask and a good frame, though I’m skeptical of the claim of strong correlation in the case of humans (at least in cases where the question is interesting enough to bother asking at all).
For people, at least, there is a strong correlation between “answers to ‘what would you do in situation X?’” and “what you actually do in situation X.” Similarly, we could also measure these correlations for language models so as to empirically quantify the strength of the critique you’re making. If there’s low correlation for relevant situations, then your critique is well-placed.
(There might be a lot of noise, depending on how finicky the replies are relative to the prompt.)
I agree that that’s a useful question to ask and a good frame, though I’m skeptical of the claim of strong correlation in the case of humans (at least in cases where the question is interesting enough to bother asking at all).