Quintin Pope comments on Dreams of AI alignment: The danger of suggestive names

Quintin Pope 13 Feb 2024 4:45 UTC
8 points
2
How many times has someone expressed “I’m worried about ‘goal-directed optimizers’, but I’m not sure what exactly they are, so I’m going to work on deconfusion.”? There’s something weird about this sentiment, don’t you think?

IMO, the weird/off thing is that the people saying this don’t have sufficient evidence to highlight this specific vibe bundle as being a “real / natural thing that just needs to be properly formalized”, rather than there being no “True Name” for this concept, and it turns out to be just another situationally useful high level abstraction. It’s like someone saying they want to “deconfuse” the concept of a chair.
Or like someone pointing at a specific location on a blank map and confidently declaring that there’s a dragon at that spot, but then admitting that they don’t actually know what exactly a “dragon” is, have never seen one, and only have theoretical / allegorical arguments to support their existence^[1]. Don’t worry though, they’ll resolve the current state of confusion by thinking really hard about it and putting together a taxonomy of probable dragon subspecies.
1. ^
  If you push them on this point, they might say that actually humans have some pretty dragon-like features, so it only makes sense that real dragons would exist somewhere in creature space.
  Also, dragons are quite powerful, so naturally many types of other creatures would tend to become dragons over time. And given how many creatures there are in the world, it’s inevitable that at least one would become a dragon eventually.
- ryan_greenblatt 13 Feb 2024 5:27 UTC
  10 points
  2
  Parent
  Are you claiming that future powerful AIs won’t be well described as pursuing goals (aka being goal-directed)? This is the read I get from the the “dragon” analogy you mention, but this can’t possibly be right because AI agents are already obviously well described as pursuing goals (perhaps rather stupidly). TBC the goals that current AI agents end up pursuing are instructions in natural language, not something more exotic.
  
  (As far I can tell the word “optimizer” in “goal-directed optimizer” is either meaningless or redundant, so I’m ignoring that.)
  
  Perhaps you just mean that future powerful AIs won’t ever be well described as consistently (e.g. across contexts) and effectively pursuing specific goals which they weren’t specifically trained or instructed to pursue?
  
  Or that goal-directed behavior won’t arise emergently prior to humans being totally obsoleted by our AI successors (and possibly not even after that)?
  
  TBC, I agree that some version of “deconfusing goal-directed behavior” is pretty similar to “deconfusing chairs” or “deconfusing consciousness”^[1] (you might gain value from doing it, but only because you’ve ended up in a pretty weird epistemic state)
  ↩︎
  See also “the meta problem of consciousness”
  - Signer 13 Feb 2024 8:47 UTC
    5 points
    2
    Parent
    What do you mean by “well described”?
    - ryan_greenblatt 13 Feb 2024 16:32 UTC
      2 points
      0
      Parent
      By well described, I mean a central example of how people typically use the word.
      
      E.g., matches most common characteristics in the cluster around the word “goal”.
      
      In the same way as something can be well described as a chair if it has a chair like shape and people use it for sitting.
  - ryan_greenblatt 13 Feb 2024 5:37 UTC
    2 points
    0
    Parent
    (Separately, I was confused by the original footnote. Is Alex claiming that deconfusing goal-directedness is a thing that no one has tried to do? (Seems wrong so probably not?) Or that it’s strange to be worried when the argument for worry depends on something so fuzzy that you need to deconfuse it? I think the second one after reading your comment, but I’m still unsure. Not important to respond.)
    - mattmacdermott 13 Feb 2024 9:21 UTC
      1 point
      0
      Parent
      He means the second one.
      
      Seems true in the extreme (if you have 0 idea what something is how can you reasonably be worried about it), but less strange the futher you get from that.