Charlie Steiner answers [DISC] Are Values Robust?

Charlie Steiner 21 Dec 2022 3:33 UTC
2 points
0
My best guess is that if we pretend we knew how to define a space where AIs that are similar under self-modification are close together, there would indeed be basins of attraction around most good points (AIs that do good things with the galaxy). However, I see no particular reason why there should only be one such basin of attraction, at least not without defining your space in an unnatural way. And of course there are going to be plenty of other basins of attraction, you don’t ever get alignment by default by just throwing a dart into AI-space.
- DragonGod 21 Dec 2022 10:12 UTC
  1 point
  0
  Parent
  A load bearing claim of the robust values hypothesis for “alignment by default” is $# 2$ :
  1. Said subset is a “naturalish” abstraction
    The more natural the abstraction, the more robust values are
    Example operationalisations of “naturalish abstraction”
    The subset is highly privileged by the inductive biases of most learning algorithms that can efficiently learn our universe
    More privileged $\to$ more natural
    Most efficient representations of our universe contain a simple embedding of the subset
    Simpler embeddings $\to$ more natural
  The safety comes from $# 3$ , and $# 1$ , but $# 2$ is why we’re not throwing a dart at random into AI space. It’s a property that makes value learning easier.
  - Charlie Steiner 21 Dec 2022 10:40 UTC
    2 points
    0
    Parent
    Sure. Though see Take 4.
    - DragonGod 21 Dec 2022 12:38 UTC
      1 point
      0
      Parent
      Claim #1 (about a “privileged subset”) is a claim that there aren’t multiple such natural abstractions (e.g. any other subset of human values that satisfies #3 would be a superset of the privileged subset, or a subset of the basin of attraction around the privileged subset.)
      
      [But I haven’t yet fully read that post or your other linked posts.]