Charlie Steiner comments on Theoretical Neuroscience For Alignment Theory

Charlie Steiner 9 Dec 2021 0:54 UTC
LW: 6 AF: 4
AF
Part of what makes me skeptical of the logic “we have seen humans who we trust, so the same design space probably has decent density of superhumans who we’d trust” is that I’m not sold on the the (effective) orthogonality thesis for human brains. Our cognitive limitations seem like they’re an active ingredient in our conceptual/moral development. We might easily know how to get human-level brain-like AI to be trustworthy but never know how to get the same design with 10x the resources to be trustworthy.
- Steven Byrnes 9 Dec 2021 1:39 UTC
  LW: 4 AF: 3
  AF Parent
  There are humans with a remarkable knack for coming up with new nanotech inventions. I don’t think they (we?) have systematically different and worse motivations than normal humans. If they had even more of a remarkable knack—outside the range of humans—I don’t immediately see what would go wrong.
  If you personally had more time to think and reflect, and more working memory and attention span, would you be concerned about your motivations becoming malign?
  (We might be having one of those silly arguments where you say “it might fail, we would need more research” and I say “it might succeed, we would need more research”, and we’re not actually disagreeing about anything.)