There’s a weaker statement, “there exist humans who have wound up with basically the kinds of motivations that we would want an AGI to have”. For example, Eliezer endorses a statement kinda like that here (and he names names—Carl Shulman & Paul Christiano). If you believe that weaker statement, it suggests that we’re mucking around in a generally-promising space, but that we still have work to do. Note that motivations come from a combination of algorithms and “training data” / “life experience”, both of which are going to be hard or impossible to match perfectly between humans and AGIs. The success story requires having enough understanding to reconstruct the important-for-our-purposes aspects.
Part of what makes me skeptical of the logic “we have seen humans who we trust, so the same design space probably has decent density of superhumans who we’d trust” is that I’m not sold on the the (effective) orthogonality thesis for human brains. Our cognitive limitations seem like they’re an active ingredient in our conceptual/moral development. We might easily know how to get human-level brain-like AI to be trustworthy but never know how to get the same design with 10x the resources to be trustworthy.
There are humans with a remarkable knack for coming up with new nanotech inventions. I don’t think they (we?) have systematically different and worse motivations than normal humans. If they had even more of a remarkable knack—outside the range of humans—I don’t immediately see what would go wrong.
If you personally had more time to think and reflect, and more working memory and attention span, would you be concerned about your motivations becoming malign?
(We might be having one of those silly arguments where you say “it might fail, we would need more research” and I say “it might succeed, we would need more research”, and we’re not actually disagreeing about anything.)
There’s a weaker statement, “there exist humans who have wound up with basically the kinds of motivations that we would want an AGI to have”. For example, Eliezer endorses a statement kinda like that here (and he names names—Carl Shulman & Paul Christiano). If you believe that weaker statement, it suggests that we’re mucking around in a generally-promising space, but that we still have work to do. Note that motivations come from a combination of algorithms and “training data” / “life experience”, both of which are going to be hard or impossible to match perfectly between humans and AGIs. The success story requires having enough understanding to reconstruct the important-for-our-purposes aspects.
Part of what makes me skeptical of the logic “we have seen humans who we trust, so the same design space probably has decent density of superhumans who we’d trust” is that I’m not sold on the the (effective) orthogonality thesis for human brains. Our cognitive limitations seem like they’re an active ingredient in our conceptual/moral development. We might easily know how to get human-level brain-like AI to be trustworthy but never know how to get the same design with 10x the resources to be trustworthy.
There are humans with a remarkable knack for coming up with new nanotech inventions. I don’t think they (we?) have systematically different and worse motivations than normal humans. If they had even more of a remarkable knack—outside the range of humans—I don’t immediately see what would go wrong.
If you personally had more time to think and reflect, and more working memory and attention span, would you be concerned about your motivations becoming malign?
(We might be having one of those silly arguments where you say “it might fail, we would need more research” and I say “it might succeed, we would need more research”, and we’re not actually disagreeing about anything.)