Steven Byrnes comments on My take on Jacob Cannell’s take on AGI safety

Steven Byrnes 30 Nov 2022 18:37 UTC
7 points
3
The above seems likely enough in the context of CEV (again), but otherwise false.
I think there might be a mix-up here. There are two topics of discussion:
- One topic is: “We should look at humans and human values since those are the things we want to align an AGI to.”
- The other topic is: “We should look at humans and human values since AGI learning algorithms are going to resemble human brain within-lifetime learning algorithms, and humans provide evidence for what those algorithms do in different training environments”.
The part of the post that you excerpted is about the latter, not the former.
Imagine that God gives you a puzzle: You get most of the machinery for a human brain but some of the innate drive neural circuitry has been erased and replaced by empty boxes. You’re allowed to fill in the boxes however you want. You’re not allowed to cheat by looking at actual humans. Your goal is to fill in the boxes such that the edited-human winds up altruistic.
So you have a go at filling in the boxes. God lets you do as many validation runs as you want. The validation runs involve raising the edited-human in a 0AD society and seeing what they wind up like. After a few iterations, you find settings where the edited-humans reliably grow up very altruistic in every 0AD society you can think to try.
Now that your validation runs are done, it’s time for the test run. So the question is: if you put the same edited-human-brain in a 2022AD society, will it also grow up altruistic on the first try?
I think a good guess is “yes”. I think that’s what Jacob is saying.
(For my part, I think Jacob’s point there is fair, and a helpful way to think about it, even if it doesn’t completely allay my concerns.)