I think this post shifts the burden to risk-concerned folks and only justifies that risk with its own poor analogies. 7⁄10 of the analogy-makers you cite here have pdooms <=50%, so they are only aiming for plausibility. You admit that the analogies point to the “logical possibility” of stark misalignment, but one person’s logical possibility is another person’s plausibility.
To give an example, Golden Retrievers are much more cherry-picked than Cotra’s lion/chimpanzee examples. Of all the species on Earth, the ones we’ve successfully domesticated are a tiny, tiny minority. Maybe you’d say we have a high success rate when we try to domesticate a species, but that took a long time in each case and is still meaningfully incomplete. I think 7⁄10 are advocating for taking the time with AIs and would be right to say we shouldn’t expect e.g. lion domestication to happen overnight, even thought we’re likely to succeed eventually.
The presentation of Quentin’s alternate evolution argument seems plausible, but not clearly more convincing than the more common version one might hear from risk-concerned folks. Training fixes model weights to some degree, after which you can do some weaker adjustments with things like fine-tuning, RLHF, and (in the behavioral sense, maybe) prompt-engineering. Our genes seem like the most meaningfully fixed thing about us, and those are ~entirely a product of our ancestors’ performance, which is heavily weighted toward the more stable pre-agricultural and pre-industrial human environments.
To give an example, Golden Retrievers are much more cherry-picked than Cotra’s lion/chimpanzee examples. Of all the species on Earth, the ones we’ve successfully domesticated are a tiny, tiny minority. Maybe you’d say we have a high success rate when we try to domesticate a species, but that took a long time in each case and is still meaningfully incomplete.
My basic response is that, while you can find reasons to believe that the golden retriever analogy is worse than the chimpanzee analogy, you can equally find reasons to think the chimpanzee analogy is worse, and there isn’t really a strong argument either way. For example, it’s not a major practice among researchers to selectively breed chimpanzees, as far as I can tell, whereas by contrast AIs are selected (by gradient descent) to exhibit positive behaviors. I think this is actually a huge weakness in the chimp analogy, since “selective breeding” looks way more similar to what we’re doing with AIs compared to the implicit image of plucking random animals from nature and putting them into our lives.
But again, I’m not really trying to say “just use a different analogy”. I think there’s a big problem if we use analogies selectively at all; and if we’re doing that, we should probably try to be way more rigorous.
I think this post shifts the burden to risk-concerned folks and only justifies that risk with its own poor analogies. 7⁄10 of the analogy-makers you cite here have pdooms <=50%, so they are only aiming for plausibility. You admit that the analogies point to the “logical possibility” of stark misalignment, but one person’s logical possibility is another person’s plausibility.
To give an example, Golden Retrievers are much more cherry-picked than Cotra’s lion/chimpanzee examples. Of all the species on Earth, the ones we’ve successfully domesticated are a tiny, tiny minority. Maybe you’d say we have a high success rate when we try to domesticate a species, but that took a long time in each case and is still meaningfully incomplete. I think 7⁄10 are advocating for taking the time with AIs and would be right to say we shouldn’t expect e.g. lion domestication to happen overnight, even thought we’re likely to succeed eventually.
The presentation of Quentin’s alternate evolution argument seems plausible, but not clearly more convincing than the more common version one might hear from risk-concerned folks. Training fixes model weights to some degree, after which you can do some weaker adjustments with things like fine-tuning, RLHF, and (in the behavioral sense, maybe) prompt-engineering. Our genes seem like the most meaningfully fixed thing about us, and those are ~entirely a product of our ancestors’ performance, which is heavily weighted toward the more stable pre-agricultural and pre-industrial human environments.
My basic response is that, while you can find reasons to believe that the golden retriever analogy is worse than the chimpanzee analogy, you can equally find reasons to think the chimpanzee analogy is worse, and there isn’t really a strong argument either way. For example, it’s not a major practice among researchers to selectively breed chimpanzees, as far as I can tell, whereas by contrast AIs are selected (by gradient descent) to exhibit positive behaviors. I think this is actually a huge weakness in the chimp analogy, since “selective breeding” looks way more similar to what we’re doing with AIs compared to the implicit image of plucking random animals from nature and putting them into our lives.
But again, I’m not really trying to say “just use a different analogy”. I think there’s a big problem if we use analogies selectively at all; and if we’re doing that, we should probably try to be way more rigorous.