I think one of the things that average AI researchers are thinking about brains is that humans might not be very safe for other humans (link to Wei Dai post). I at least would pretty strongly disagree with “The brain is a totally aligned general intelligence.”
I really like the thought about empathy as an important active ingredient in learning from other peoples’ experience. It’s very cool. It sort of implies an evo-psych story about the incentives for empathy that I’m not sure is true—what’s the prevalence of empathy in social but non-general animals?
Also, by coincidence, I just today finished a late draft of a post on second-order alignment, except with the other perspective (modeling it in humans, rather than in AIs), so I’m feeling some competition :P
There’s a weaker statement, “there exist humans who have wound up with basically the kinds of motivations that we would want an AGI to have”. For example, Eliezer endorses a statement kinda like that here (and he names names—Carl Shulman & Paul Christiano). If you believe that weaker statement, it suggests that we’re mucking around in a generally-promising space, but that we still have work to do. Note that motivations come from a combination of algorithms and “training data” / “life experience”, both of which are going to be hard or impossible to match perfectly between humans and AGIs. The success story requires having enough understanding to reconstruct the important-for-our-purposes aspects.
Part of what makes me skeptical of the logic “we have seen humans who we trust, so the same design space probably has decent density of superhumans who we’d trust” is that I’m not sold on the the (effective) orthogonality thesis for human brains. Our cognitive limitations seem like they’re an active ingredient in our conceptual/moral development. We might easily know how to get human-level brain-like AI to be trustworthy but never know how to get the same design with 10x the resources to be trustworthy.
There are humans with a remarkable knack for coming up with new nanotech inventions. I don’t think they (we?) have systematically different and worse motivations than normal humans. If they had even more of a remarkable knack—outside the range of humans—I don’t immediately see what would go wrong.
If you personally had more time to think and reflect, and more working memory and attention span, would you be concerned about your motivations becoming malign?
(We might be having one of those silly arguments where you say “it might fail, we would need more research” and I say “it might succeed, we would need more research”, and we’re not actually disagreeing about anything.)
I don’t think I claimed that the brain is a totally aligned general intelligence, and if I did, I take it back! For now, I’ll stand by what I said here: “if we comprehensively understood how the human brain works at the algorithmic level, then necessarily embedded in this understanding should be some recipe for a generally intelligent system at least as aligned to our values as the typical human brain.” This seems harmonious with what I take your point to be: that the human brain is not a totally aligned general intelligence. I second Steve’s deferral to Eliezer’s thoughts on the matter, and I mean to endorse something similar here.
what’s the prevalence of empathy in social but non-general animals?
Here’s a good summary. I also found a really nice non-academic article in Vox on the topic.
And I’m looking forward to seeing your post on second-order alignment! I think the more people who take the concern seriously (and put forward compelling arguments to that end), the better.
Nice post!
I think one of the things that average AI researchers are thinking about brains is that humans might not be very safe for other humans (link to Wei Dai post). I at least would pretty strongly disagree with “The brain is a totally aligned general intelligence.”
I really like the thought about empathy as an important active ingredient in learning from other peoples’ experience. It’s very cool. It sort of implies an evo-psych story about the incentives for empathy that I’m not sure is true—what’s the prevalence of empathy in social but non-general animals?
Also, by coincidence, I just today finished a late draft of a post on second-order alignment, except with the other perspective (modeling it in humans, rather than in AIs), so I’m feeling some competition :P
There’s a weaker statement, “there exist humans who have wound up with basically the kinds of motivations that we would want an AGI to have”. For example, Eliezer endorses a statement kinda like that here (and he names names—Carl Shulman & Paul Christiano). If you believe that weaker statement, it suggests that we’re mucking around in a generally-promising space, but that we still have work to do. Note that motivations come from a combination of algorithms and “training data” / “life experience”, both of which are going to be hard or impossible to match perfectly between humans and AGIs. The success story requires having enough understanding to reconstruct the important-for-our-purposes aspects.
Part of what makes me skeptical of the logic “we have seen humans who we trust, so the same design space probably has decent density of superhumans who we’d trust” is that I’m not sold on the the (effective) orthogonality thesis for human brains. Our cognitive limitations seem like they’re an active ingredient in our conceptual/moral development. We might easily know how to get human-level brain-like AI to be trustworthy but never know how to get the same design with 10x the resources to be trustworthy.
There are humans with a remarkable knack for coming up with new nanotech inventions. I don’t think they (we?) have systematically different and worse motivations than normal humans. If they had even more of a remarkable knack—outside the range of humans—I don’t immediately see what would go wrong.
If you personally had more time to think and reflect, and more working memory and attention span, would you be concerned about your motivations becoming malign?
(We might be having one of those silly arguments where you say “it might fail, we would need more research” and I say “it might succeed, we would need more research”, and we’re not actually disagreeing about anything.)
Thank you!
I don’t think I claimed that the brain is a totally aligned general intelligence, and if I did, I take it back! For now, I’ll stand by what I said here: “if we comprehensively understood how the human brain works at the algorithmic level, then necessarily embedded in this understanding should be some recipe for a generally intelligent system at least as aligned to our values as the typical human brain.” This seems harmonious with what I take your point to be: that the human brain is not a totally aligned general intelligence. I second Steve’s deferral to Eliezer’s thoughts on the matter, and I mean to endorse something similar here.
Here’s a good summary. I also found a really nice non-academic article in Vox on the topic.
And I’m looking forward to seeing your post on second-order alignment! I think the more people who take the concern seriously (and put forward compelling arguments to that end), the better.
Ah, you’re too late to look forward to it, it’s already published :P