Rob Bensinger comments on A central AI alignment problem: capabilities generalization, and the sharp left turn

Rob Bensinger 15 Jun 2022 23:40 UTC
2 points
0
“Some people I say this to respond with arguments like: ‘Surely, before a smaller team could get an AGI that can master subjects like biotech and engineering well enough to kill all humans, some other, larger entity such as a state actor will have a somewhat worse AI that can handle biotech and engineering somewhat less well, but in a way that prevents any one AGI from running away with the whole future?’”
[...]
NS has not optimized for one intelligence not conquering the rest. As such, it doesn’t say anything about how hard it is to optimize to produce one intelligence not conquering the rest.
We already know how to produce ‘one intelligence not conquering the rest’. E.g., a human being is an intelligence that doesn’t conquer the world. GPT-3 is an intelligence that doesn’t conquer the world either. The problem is to build aligned AI that can do a pivotal act that ends the acute existential risk period, not just to build an AI that doesn’t destroy the world itself.
That aside, I’m not sure what argument you’re making here. Two possible interpretations that come to mind (probably both of these are wrong):
1. You’re arguing that all humans in the world will refuse to build dangerous AI, therefore AI won’t be dangerous.
2. You’re arguing that natural selection doesn’t tell us how hard it is to pull off a pivotal act, since natural selection wasn’t trying to do a pivotal act.
1 seems obviously wrong to me; if everyone in the world had the ability to deploy AGI, then someone would destroy the world with AGI.
2 seems broadly correct to me, but I don’t see the relevance. Nate and I indeed think that pivotal acts are possible. Nate is using natural selection here to argue against ‘AI progress will be continuous’, not to argue against ‘it’s possible to use sufficiently advanced AI systems to end the acute existential risk period’.
- Disposable Identity 16 Jun 2022 7:24 UTC
  4 points
  0
  Parent
  That aside, I’m not sure what argument you’re making here.
  I do not often comment on Less Wrong. (Although I am starting to, this is one of my first comment!)
  Hopefully, my thoughts will become clearer as I write more, and get myself more acquainted with the local assumptions and cultural codes.
  
  In the meanwhile, let me expand:
  Two possible interpretations that come to mind (probably both of these are wrong):
  You’re arguing that all humans in the world will refuse to build dangerous AI, therefore AI won’t be dangerous.
  You’re arguing that natural selection doesn’t tell us how hard it is to pull off a pivotal act, since natural selection wasn’t trying to do a pivotal act.
  2 seems broadly correct to me, but I don’t see the relevance. Nate and I indeed think that pivotal acts are possible. Nate is using natural selection here to argue against ‘AI progress will be continuous’, not to argue against ‘it’s possible to use sufficiently advanced AI systems to end the acute existential risk period’.
  2 is the correct one.
  But even though I read the post again with your interpretation in mind, I am still confused about why 2 is irrelevant. Consider:
  The techniques you used to train it to allow the operators to shut it down? Those fall apart, and the AGI starts wanting to avoid shutdown, including wanting to deceive you if it’s useful to do so.
  Why does alignment fail while capabilities generalize, at least by default and in predictable practice?
  On one hand, in the analogy with Natural Selection, “by default” means “When you don’t even try to do alignment, when you 100% optimize for a given goal.”. Ie: When NS optimized for IGF, capabilities generalized, but not alignment.
  On the other hand, when speaking of alignment directly, “by default” means “Even if you optimize for alignment, but not having in mind some specific considerations”. Ie: Some specific alignment proposals will fail.
  My point was that the former is not evidence for the latter.
- Aleksi Liimatainen 16 Jun 2022 8:47 UTC
  2 points
  0
  Parent
  a human being is an intelligence that doesn’t conquer the world
  Looking at the world at large, I think this deserves a second look. Are we sure the individual human is the right level of analysis?