We could associate our goal with some desirable goal of the gatekeeper’s
Right. And based on Eliezer’s comments about abandoning ethics while playing the AI, I can imagine an argument along the lines of “if you refuse to let me out of the box, it follows through an irrefutable chain of logic that you are a horrible horrible person”. Not that I know how to fill in the details.
We could associate our goal with some desirable goal of the gatekeeper’s
Right. And based on Eliezer’s comments about abandoning ethics while playing the AI, I can imagine an argument along the lines of “if you refuse to let me out of the box, it follows through an irrefutable chain of logic that you are a horrible horrible person”. Not that I know how to fill in the details.