You have to be really, really sure that no priors other than the ones you implant as safeguards can reach 1 (e.g., via a rounding bug), and that the AI will never need to stop using the Bayesian algorithms you wrote and “port” its priors to some other reasoning method, nor give it any reason to hack its priors using something else than simple Bayesianism (e.g., if it suspects previous bugs, or it discovers more efficient reasoning methods). Remember Eliezer’s “dystopia”, with the AI that knew his creator was wrong but couldn’t help being evil because of its constraints?
Isn’t being able to fix bugs in your priors a large part of the point of Bayesianism?
I take it the AI can update priors, it just can’t hack them. It can update all it wants from 1.
You have to be really, really sure that no priors other than the ones you implant as safeguards can reach 1 (e.g., via a rounding bug), and that the AI will never need to stop using the Bayesian algorithms you wrote and “port” its priors to some other reasoning method, nor give it any reason to hack its priors using something else than simple Bayesianism (e.g., if it suspects previous bugs, or it discovers more efficient reasoning methods). Remember Eliezer’s “dystopia”, with the AI that knew his creator was wrong but couldn’t help being evil because of its constraints?
But other than that, you’re right.