Evan R. Murphy comments on AGI Ruin: A List of Lethalities

Evan R. Murphy 11 Jun 2022 20:09 UTC
1 point
0
What you’re saying seems to contradict the orthogonality thesis. Intelligence level and goals are independent, or at least not tightly interdependent.

Let’s use the common example of a paperclip maximizer. Maximizing total long-term paperclips is a strange goal for an agent to have, but most people in AI alignment think it’s possible that an AI ~~could be trained to optimize for something like this~~ like this could in principle emerge from training (though we don’t know how to reliably train one on purpose).

Now why couldn’t an agent by motivated to maximize short-term paperclips? It wants more paperclips, but it will always take 1 paperclip now over 1 or even 10 or 100 a minute in the future. It wants paperclips ASAP. This is one contrived example of what a myopic AI might look like—a myopic paperclip maximizer.
- Eliezer Yudkowsky 12 Jun 2022 3:16 UTC
  11 points
  11
  Parent
  I don’t think we could train an AI to optimize for long-term paperclips. Maybe I’m not “most people in AI alignment” but still, just saying.
  - Evan R. Murphy 12 Jun 2022 6:56 UTC
    1 point
    0
    Parent
    I was trying to contrast the myopic paperclip maximizer idea with the classic paperclip maximizer. Perhaps “long-term” was a lousy choice of words. What would be better: simple paperclip maximizer, unconditional paperclip maximizer, or something?
    
    Update: On second thought, maybe what you were getting at is that it’s not clear how to deliberately train a paperclip maximizer in the current paradigm. If you tried, you’d likely end up with a mesa-optimizer on some unpredictable proxy objective, like a deceptively aligned steel maximizer.
- TekhneMakre 11 Jun 2022 20:52 UTC
  3 points
  −1
  Parent
  
  Yes, I’m saying that AIs are very likely to have (in a broad sense, including e.g. having subagents that have) long-term goals.
  Now why couldn’t an agent by motivated to maximize short-term paperclips?
  It *could*, but I’m saying that making an AI like that isn’t like choosing a loss function for training, because long-term thinking is convergent.
  Your original comment said:
  I can’t see anything unnatural about an agent that has both consequentialist reasoning capabilities and a high time preference.
  This is what I’m arguing against. I’m saying it’s very unnatural. *Possible*, but very unnatural.
  And:
  This means that it would never sacrifice reward now for reward later, and so it would essentially be exempt from instrumental convergence.
  This sounds like you’re saying that myopia *makes* there not be convergent instrumental goals. I’m saying myopia basically *implies* there not being convergent instrumental goals, and therefore is at least as hard as making there not be CIGs.
- Rob Bensinger 12 Jun 2022 3:36 UTC
  2 points
  0
  Parent
  most people in AI alignment think it’s possible that an AI could be trained to optimize for something like this.
  I don’t think we have any idea how to do this. If we knew how to get an AGI system to reliably maximize the number of paperclips in the universe, that might be most of the (strawberry-grade) alignment problem solved right there.
  - Evan R. Murphy 12 Jun 2022 7:32 UTC
    1 point
    0
    Parent
    You’re right, my mistake—of course we don’t know how to deliberately and reliably train a paperclip maximizer. I’ve updated the parent comment now to say:
    most people in AI alignment think it’s possible that an AI like this could in principle emerge from training (though we don’t know how to reliably train one on purpose).
- Jeff Rose 11 Jun 2022 20:44 UTC
  2 points
  −1
  Parent
  It feels like you are setting a discount rate higher than reality demands. A rationally intelligent agent should wind up with a discount rate that matches reality (e.g. in this case, probably the rate at which paper clips decay or the global real rate of interest).