You have to specify the right thing for whom. And the AGI won’t know what it is for sure, in a realistic slow takeoff during the critical risk period. See my reply to Charlie above.
Sure, but my point here is that AGI will be only weakly superhuman during the critical risk period, so it will be highly uncertain, and probably human judgment is likely to continue to play a large role. Quite possibly to our detriment.
Perhaps seemingly obvious, but given some of the reactions around Apple putting “Do not hallucinate” into the system prompt of its AI …
If you do get an instruction-following AI that you can simply give the instruction, “Do the right thing”, and it would just do the right thing:
Remember to give the instruction.
You have to specify the right thing for whom. And the AGI won’t know what it is for sure, in a realistic slow takeoff during the critical risk period. See my reply to Charlie above.
But yes, using the AGIs intelligence to help you issue good instrctions is definitely a good idea. See my Instruction-following AGI is easier and more likely than value aligned AGI for more logic on why.
All non-omniscient agents make decisions with incomplete information. I don’t think this will change at any level of takeoff.
Sure, but my point here is that AGI will be only weakly superhuman during the critical risk period, so it will be highly uncertain, and probably human judgment is likely to continue to play a large role. Quite possibly to our detriment.