Hm. This doesn’t seem right to me. My approach for trying to form an intuition here includes returning to the example (in a parent comment)
For example, the room could be divided in half, with Operator 1 interacting BoMAI, and with Operator 2 observing Operator 1...
but I don’t imagine this satisfies you. Another piece of the intuition is that mind-hacking for the aim of reward within the episode, or even the possible instrumental aim of operator-devotion, still doesn’t seem very existentially risky to me, given the lack of optimization pressure to that effect. (I know the latter comment sort of belongs in other branches of our conversation, so we should continue to discuss it elsewhere).
Maybe other people can weigh in on this, and we can come back to it.
Hm. This doesn’t seem right to me. My approach for trying to form an intuition here includes returning to the example (in a parent comment)
but I don’t imagine this satisfies you. Another piece of the intuition is that mind-hacking for the aim of reward within the episode, or even the possible instrumental aim of operator-devotion, still doesn’t seem very existentially risky to me, given the lack of optimization pressure to that effect. (I know the latter comment sort of belongs in other branches of our conversation, so we should continue to discuss it elsewhere).
Maybe other people can weigh in on this, and we can come back to it.