David Johnston comments on AGI Ruin: A List of Lethalities

David Johnston 8 Jun 2022 9:46 UTC
1 point
0
Humans can, to some extent, be pointed to complicated external things. This suggests that using natural selection on biology can get you mesa-optimizers that can be pointed to particular externally specifiable complicated things. Doesn’t prove it (or, doesn’t prove you can do it again), but you only asked for a suggestion.
- Eliezer Yudkowsky 8 Jun 2022 22:34 UTC
  7 points
  3
  Parent
  Humans can be pointed at complicated external things by other humans on their own cognitive level, not by their lower maker of natural selection.
  - TurnTrout 9 Jun 2022 0:57 UTC
    2 points
    0
    Parent
    I don’t think I understand what, exactly, is being discussed. Are “dogs” or “flowers” or “people you meet face-to-face” examples of “complicated external things”?
  - David Johnston 8 Jun 2022 22:50 UTC
    1 point
    −2
    Parent
    Right, but the goal is to make AGI you can point at things, not to make AGI you can point at things using some particular technique.
    
    (Tangentially, I also think the jury is still out on whether humans are bad fitness maximizers, and if we’re ultimately particularly good at it—e.g. let’s say, barring AGI disaster, we’d eventually colonise the galaxy—that probably means AGI alignment is harder, not easier)
- Rob Bensinger 8 Jun 2022 19:34 UTC
  2 points
  −1
  Parent
  To my eye, this seems like it mostly establishes ‘it’s not impossible in principle for an optimizer to have a goal that relates to the physical world’. But we had no reason to doubt this in the first place, and it doesn’t give us a way to reliably pick in advance which physical things the optimizer cares about. “It’s not impossible” is a given for basically everything in AI, in principle, if you have arbitrary amounts of time and arbitrarily deep understanding.
  - David Johnston 9 Jun 2022 2:09 UTC
    5 points
    6
    Parent
    As I said (a few times!) in the discussion about orthogonality, indifference about the measure of “agents” that have particular properties seems crazy to me. Having an example of “agents” that behave in a particular way is a enormously different to having an unproven claim that such agents might be mathematically possible.