This comment seems to me to be pointing at something very important which I had not hitherto grasped.
My (shitty) summary:
There’s a big difference between gains from improving the architecture / abilities of a system (the genome, for human agents) and gains from increasing knowledge developed over the course of an episode (or lifetime). In particular they might differ in how easy to “get the alignment in”.
If the AGI is doing consequentialist reasoning while it is still mostly getting gains from gradient descent as opposed to from knowledge collected over an episode, then we have more ability to steer it’s trajectory.
This comment seems to me to be pointing at something very important which I had not hitherto grasped.
My (shitty) summary:
There’s a big difference between gains from improving the architecture / abilities of a system (the genome, for human agents) and gains from increasing knowledge developed over the course of an episode (or lifetime). In particular they might differ in how easy to “get the alignment in”.
If the AGI is doing consequentialist reasoning while it is still mostly getting gains from gradient descent as opposed to from knowledge collected over an episode, then we have more ability to steer it’s trajectory.