Eli Tyre comments on On how various plans miss the hard bits of the alignment challenge

Eli Tyre 22 Jul 2022 22:08 UTC
11 points
8
This comment seems to me to be pointing at something very important which I had not hitherto grasped.
My (shitty) summary:

There’s a big difference between gains from improving the architecture / abilities of a system (the genome, for human agents) and gains from increasing knowledge developed over the course of an episode (or lifetime). In particular they might differ in how easy to “get the alignment in”.

If the AGI is doing consequentialist reasoning while it is still mostly getting gains from gradient descent as opposed to from knowledge collected over an episode, then we have more ability to steer it’s trajectory.