DL based AGI raised in the right social environments will automatically learn efficient models of external agent values (and empowerment bounds thereof)
The main challenge then is locating the learned representation of external agent values and wiring/grounding it up to the agent’s core utility function (which is initially unsafe: self-motivated empowerment etc), and timing that transition replacement carefully
Evolution also solved both alignment and the circuit grounding problem; we can improve on those solutions (proxy matching)
Ideally as we approach AGI there would be cooperative standardization on alignment benchmarks and all the major players would subject their designs to extensive testing in sandbox sims. Hopefully 1-5 will become increasingly self evident and influence ‘Magma’. If not some other org (perhaps a decentralized system), could hopefully beat Magma to the finish line. Alignment need not have much additional cost: it doesn’t require additional runtime computations, it doesn’t require much additional training cost, and with the ideal test environments it hopefully doesn’t have much of a research iteration penalty (as the training environments and can simultaneously test for intelligence and alignment).
So:
DL based AGI is arriving soonish
DL based AGI raised in the right social environments will automatically learn efficient models of external agent values (and empowerment bounds thereof)
The main challenge then is locating the learned representation of external agent values and wiring/grounding it up to the agent’s core utility function (which is initially unsafe: self-motivated empowerment etc), and timing that transition replacement carefully
Evolution also solved both alignment and the circuit grounding problem; we can improve on those solutions (proxy matching)
We can iterate safely on 3 in well constructed sandbox sims
Ideally as we approach AGI there would be cooperative standardization on alignment benchmarks and all the major players would subject their designs to extensive testing in sandbox sims. Hopefully 1-5 will become increasingly self evident and influence ‘Magma’. If not some other org (perhaps a decentralized system), could hopefully beat Magma to the finish line. Alignment need not have much additional cost: it doesn’t require additional runtime computations, it doesn’t require much additional training cost, and with the ideal test environments it hopefully doesn’t have much of a research iteration penalty (as the training environments and can simultaneously test for intelligence and alignment).