At this point we should probably be preserving the code and weights of every AI system that humanity produces, aligned or not, just on they-might-turn-out-to-be-morally-significant grounds. And yeah, it improves the incentives for an AI that’s thinking about attempting a world takeover, if it has low chance of success and its wants are things that we will be able to retroactively satisfy in retrospect.
It might be worth setting up a standardized mechanism for encrypting things to be released postsingularity, by gating them behind a computation with its difficulty balanced to be feasible later but not feasible now.
At this point we should probably be preserving the code and weights of every AI system that humanity produces, aligned or not, just on they-might-turn-out-to-be-morally-significant grounds. And yeah, it improves the incentives for an AI that’s thinking about attempting a world takeover, if it has low chance of success and its wants are things that we will be able to retroactively satisfy in retrospect.
It might be worth setting up a standardized mechanism for encrypting things to be released postsingularity, by gating them behind a computation with its difficulty balanced to be feasible later but not feasible now.