But for 2., how do we get an automated system and containment setup that is secure against a superintelligence?
Well, that’s what the current contest is about (in part). How you been following it? But having said that, this conversation is making me realize that some of the ideas proposed there may not make as much sense as I thought.
I’m generally confused about what capabilities are assumed—is it just souped-up modern ML?
Yeah I’m confused about this too. I asked Stuart and he didn’t really give a useful answer. I guess “under what assumed capabilities would Counterfactual Oracles be safe and useful” is also part of what needs to be worked out.
Even worse, it could (if sufficiently intelligent) subtly transfer or otherwise preserve itself before being shut down. Why are we assuming we can just shut it down, given that we have to give it at least a little time to think and train?
Are you thinking that the Oracle might have cross-episode preferences? I think to ensure safety we have to have some way to make sure that the Oracle only cares about doing well (i.e., getting a high reward) on the specific question that it’s given, and nothing else, and this may be a hard problem.
Well, that’s what the current contest is about (in part). How you been following it? But having said that, this conversation is making me realize that some of the ideas proposed there may not make as much sense as I thought.
Yeah I’m confused about this too. I asked Stuart and he didn’t really give a useful answer. I guess “under what assumed capabilities would Counterfactual Oracles be safe and useful” is also part of what needs to be worked out.
Are you thinking that the Oracle might have cross-episode preferences? I think to ensure safety we have to have some way to make sure that the Oracle only cares about doing well (i.e., getting a high reward) on the specific question that it’s given, and nothing else, and this may be a hard problem.