If the corrigibility systems are working correctly, Albert either rejected the goal of manipulating the programmers, or at the first point where Albert began to cognitively figure out how to manipulate the programmers (maximization / optimization within a prediction involving programmer reactions) the goal was detected by internal systems and Albert was automatically suspended to disk.
It is the programmers’ job not to sign stupid contracts. Young AIs should not be in the job of second-guessing them. There are more failure scenarios here than success scenarios and a young AI should not believe itself to be in the possession of info allowing them to guess which is which.
I don’t understand this part. If the AI wants something from the programmers, such as information about their values that it can extrapolate, won’t it always be committing “optimization within a prediction involving programmer reactions”? How does one distinguish this case without an adult FAI in hand? Are we counting on the young AI’s understanding of transparency?
If the corrigibility systems are working correctly, Albert either rejected the goal of manipulating the programmers, or at the first point where Albert began to cognitively figure out how to manipulate the programmers (maximization / optimization within a prediction involving programmer reactions) the goal was detected by internal systems and Albert was automatically suspended to disk.
It is the programmers’ job not to sign stupid contracts. Young AIs should not be in the job of second-guessing them. There are more failure scenarios here than success scenarios and a young AI should not believe itself to be in the possession of info allowing them to guess which is which.
I don’t understand this part. If the AI wants something from the programmers, such as information about their values that it can extrapolate, won’t it always be committing “optimization within a prediction involving programmer reactions”? How does one distinguish this case without an adult FAI in hand? Are we counting on the young AI’s understanding of transparency?