For those who, like me, have the attention span and intelligence of a door hinge the ELI5 edition is:
Outer alignment is trying to find a reward function that is aligned with our values (making it produce good stuff rather than paperclips)
Inner alignment is the act of ensuring our AI actually optimizes the reward function we specify.
An example of poor inner alignment would be us humans in the eyes of evolution. Instead of doing what evolution intended, we use contraceptives so we can have sex without procreation. If evolution had gotten its inner alignment right, we would care as much about spreading our genes as evolution does!
For those who, like me, have the attention span and intelligence of a door hinge the ELI5 edition is:
Outer alignment is trying to find a reward function that is aligned with our values (making it produce good stuff rather than paperclips)
Inner alignment is the act of ensuring our AI actually optimizes the reward function we specify.
An example of poor inner alignment would be us humans in the eyes of evolution. Instead of doing what evolution intended, we use contraceptives so we can have sex without procreation. If evolution had gotten its inner alignment right, we would care as much about spreading our genes as evolution does!