And more importantly, to prevent deceptive alignment from happening, which would allow a treacherous turn.
A lot of overrated alignment plans have the function that they get outer alignment at optimum, that is the values you want to instill do not break at optimality, but use handwavium to bypass deceptive alignment, proxy and suboptimality alignment.
(Jacob Cannell is better than Alex Turner at this, since he incorporates a AI sandbox which importantly, prevents the AI from knowing it’s in a simulation.)
And more importantly, to prevent deceptive alignment from happening, which would allow a treacherous turn.
A lot of overrated alignment plans have the function that they get outer alignment at optimum, that is the values you want to instill do not break at optimality, but use handwavium to bypass deceptive alignment, proxy and suboptimality alignment.
(Jacob Cannell is better than Alex Turner at this, since he incorporates a AI sandbox which importantly, prevents the AI from knowing it’s in a simulation.)