This hope does require the local oversight process to be epistemically competitive with the AI, in the sense that e.g. if the AI understands something subtle about the environment dynamics then the oversight process also needs to understand that. And that’s what we are trying to do with all of this business about training AIs to answer questions honestly. The point is just that you don’t have to clear up any of the ambiguity about what the human wants, you just have to be able to detect someone tampering with deliberation. (And the operationalization of tampering doesn’t have to be so complex.)
So you want a sort of partial universality sufficient to bootstrap the process locally (while not requiring the understanding of our values in fine details), giving us enough time for a deliberation that would epistemically dominate the AI in a global sense (and get our values right)?
If that’s about right, then I agree that having this would make your proposal work, but I still don’t know how to get it. I need to read your previous posts on reading questions honestly.
You basically just need full universality / epistemic competitiveness locally. This is just getting around “what are values?” not the need for competitiveness. Then the global thing is also epistemically competitive, and it is able to talk about e.g. how our values interact with the alien concepts uncovered by our AI (which we want to reserve time for since we don’t have any solution better than “actually figure everything out ‘ourselves’”).
Almost all of the time I’m thinking about how to get epistemic competitiveness for the local interaction. I think that’s the meat of the safety problem.
So you want a sort of partial universality sufficient to bootstrap the process locally (while not requiring the understanding of our values in fine details), giving us enough time for a deliberation that would epistemically dominate the AI in a global sense (and get our values right)?
If that’s about right, then I agree that having this would make your proposal work, but I still don’t know how to get it. I need to read your previous posts on reading questions honestly.
You basically just need full universality / epistemic competitiveness locally. This is just getting around “what are values?” not the need for competitiveness. Then the global thing is also epistemically competitive, and it is able to talk about e.g. how our values interact with the alien concepts uncovered by our AI (which we want to reserve time for since we don’t have any solution better than “actually figure everything out ‘ourselves’”).
Almost all of the time I’m thinking about how to get epistemic competitiveness for the local interaction. I think that’s the meat of the safety problem.