hairyfigment comments on My take on Jacob Cannell’s take on AGI safety

hairyfigment 30 Nov 2022 3:00 UTC
4 points
2
I’m glad Jacob agrees that empowerment could theoretically help arbitrary entities achieve arbitrary goals. (I recall someone who was supposedly great at board games recommending it as a fairly general strategy.) I don’t see how, if empowerment is compatible with almost any goal, it could prevent the AI from changing our goals whenever this is convenient.
Perhaps he thinks we can define “empowerment” to exclude this? Quick reaction: that seems likely to be FAI-complete, and somewhat unlikely to be a fruitful approach. My understanding of physics says that pretty much action has a physical effect on our brains. Therefore, the definition of which changes to our brains “empower” and which “disempower” us, may be doing all of the heavy lifting. How does this become easier to program than CEV?
Jacob responds: The distribution shift from humans born in 0AD to humans born in 2000AD seems fairly inconsequential for human alignment.
I now have additional questions. The above seems likely enough in the context of CEV (again), but otherwise false.
- Steven Byrnes 30 Nov 2022 18:37 UTC
  7 points
  3
  Parent
  The above seems likely enough in the context of CEV (again), but otherwise false.
  I think there might be a mix-up here. There are two topics of discussion:
  - One topic is: “We should look at humans and human values since those are the things we want to align an AGI to.”
  - The other topic is: “We should look at humans and human values since AGI learning algorithms are going to resemble human brain within-lifetime learning algorithms, and humans provide evidence for what those algorithms do in different training environments”.
  The part of the post that you excerpted is about the latter, not the former.
  Imagine that God gives you a puzzle: You get most of the machinery for a human brain but some of the innate drive neural circuitry has been erased and replaced by empty boxes. You’re allowed to fill in the boxes however you want. You’re not allowed to cheat by looking at actual humans. Your goal is to fill in the boxes such that the edited-human winds up altruistic.
  So you have a go at filling in the boxes. God lets you do as many validation runs as you want. The validation runs involve raising the edited-human in a 0AD society and seeing what they wind up like. After a few iterations, you find settings where the edited-humans reliably grow up very altruistic in every 0AD society you can think to try.
  Now that your validation runs are done, it’s time for the test run. So the question is: if you put the same edited-human-brain in a 2022AD society, will it also grow up altruistic on the first try?
  I think a good guess is “yes”. I think that’s what Jacob is saying.
  (For my part, I think Jacob’s point there is fair, and a helpful way to think about it, even if it doesn’t completely allay my concerns.)