No, but what are the approaches to avoiding deceptive alignment that don’t go through competitiveness?
We could talk for a while about this. But I’m not sure how much hangs on this point if I’m right, since you offered this as an extra reason to care about competitiveness, but there’s still the obvious reason to value competitiveness. And idea space is big, so you would have your work cut out to turn this from an epistemic landscape where two people can reasonably have different intuitions to an epistemic landscape that would cast serious doubt on my side.
But here’s one idea: have the AI show messages to the operator that causes them to do better on randomly selected prediction tasks, and the operator’s prediction depends on the message, obviously, but the ground truth is the counterfactual ground truth if the message were never shown, so the AI’s message can’t affect the ground truth.
And then more broadly, impact measures, conservatism, or utility information about counterfactuals to complicate wireheading, seem at least somewhat viable to me, and then you could have an agent that does more than show us text that’s only useful if it’s true. In my view, this approach is way more difficult to get safe, but if I had the position that we needed parity in competitiveness with unsafe competitors in order to use a chatbot to save the world, then I’d start to find these other approaches more appealing.
We could talk for a while about this. But I’m not sure how much hangs on this point if I’m right, since you offered this as an extra reason to care about competitiveness, but there’s still the obvious reason to value competitiveness. And idea space is big, so you would have your work cut out to turn this from an epistemic landscape where two people can reasonably have different intuitions to an epistemic landscape that would cast serious doubt on my side.
But here’s one idea: have the AI show messages to the operator that causes them to do better on randomly selected prediction tasks, and the operator’s prediction depends on the message, obviously, but the ground truth is the counterfactual ground truth if the message were never shown, so the AI’s message can’t affect the ground truth.
And then more broadly, impact measures, conservatism, or utility information about counterfactuals to complicate wireheading, seem at least somewhat viable to me, and then you could have an agent that does more than show us text that’s only useful if it’s true. In my view, this approach is way more difficult to get safe, but if I had the position that we needed parity in competitiveness with unsafe competitors in order to use a chatbot to save the world, then I’d start to find these other approaches more appealing.