Agree it’s not clear. Some reasons why they might:
If training environments’ inductive biases point firmly towards some specific (non-human) values, then maybe the misaligned AIs can just train bigger and better AI systems using similar environments that they were trained in, and hope that those AIs will end up with similar values.
Maybe values can differ a bit, and cosmopolitanism or decision theory can carry the rest of the way. Just like Paul says he’d be pretty happy with intelligent life that came from a similar distribution that our civ came from.
Humans might need to use a bunch of human labor to oversee all their human-level AIs. The HLAIs can skip this, insofar as they can trust copies of themself. And when training even smarter AI, it’s a nice benefit to have cheap copyable trustworthy human-level overseers.
Maybe you can somehow gradually increase the capabilities of your HLAIs in a way that preserves their values.
(You have a lot of high-quality labor at this point, which really helps for interpretability and making improvements through other ways than gradient descent.)
Agree it’s not clear. Some reasons why they might:
If training environments’ inductive biases point firmly towards some specific (non-human) values, then maybe the misaligned AIs can just train bigger and better AI systems using similar environments that they were trained in, and hope that those AIs will end up with similar values.
Maybe values can differ a bit, and cosmopolitanism or decision theory can carry the rest of the way. Just like Paul says he’d be pretty happy with intelligent life that came from a similar distribution that our civ came from.
Humans might need to use a bunch of human labor to oversee all their human-level AIs. The HLAIs can skip this, insofar as they can trust copies of themself. And when training even smarter AI, it’s a nice benefit to have cheap copyable trustworthy human-level overseers.
Maybe you can somehow gradually increase the capabilities of your HLAIs in a way that preserves their values.
(You have a lot of high-quality labor at this point, which really helps for interpretability and making improvements through other ways than gradient descent.)