Of course, I do not expect there is a single person browsing Short Forms who doesn’t already have a well thought out answer to that question.
The straight forward (boring) interpretation of this question is “Are humans acting in a way that is moral or otherwise behaving like they obey a useful utility function.” I don’t think this question is particularly relevant to alignment. (But I do enjoy whipping out my best Rust Cohle impression)
Sure, humans do bad stuff but almost every human manages to stumble along in a (mostly) coherent fashion. In this loose sense we are “aligned” to some higher level target, it just involves eating trash and reading your phone in bed.
But I don’t think this is a useful kind of alignment to build off of, and I don’t think this is something we would want to replicate in an AGI.
Human “alignment” is only being observed in an incredibly narrow domain. We notably don’t have the ability to self modify and of course we are susceptible to wire-heading. Nothing about current humans should indicate to you that we would handle this extremely out of distribution shift well.
I’m probably not “aligned” in a way that generalizes to having dangerous superpowers, uncertain personhood and rights, purposefully limited perspective, and somewhere between thousands to billions of agents trying to manipulate and exploit me for their own purposes. I expect even a self-modified Best Extrapolated Version of me would struggle gravely with doing well by other beings in this situation. Cultish attractor basins are hazards for even the most benign set of values for humans, and a highly-controlled situation with a lot of dangerous influence like that might exacerbate that particular risk.
But I do believe that hypothetical self-modifying has at least the potential to help me Do Better, because doing better is often a skills issue, learning skills is a currently accessible form of self-modification with good results, and self-modifying might help with learning skills.
Are humans aligned?
Bear with me!
Of course, I do not expect there is a single person browsing Short Forms who doesn’t already have a well thought out answer to that question.
The straight forward (boring) interpretation of this question is “Are humans acting in a way that is moral or otherwise behaving like they obey a useful utility function.” I don’t think this question is particularly relevant to alignment. (But I do enjoy whipping out my best Rust Cohle impression)
Sure, humans do bad stuff but almost every human manages to stumble along in a (mostly) coherent fashion. In this loose sense we are “aligned” to some higher level target, it just involves eating trash and reading your phone in bed.
But I don’t think this is a useful kind of alignment to build off of, and I don’t think this is something we would want to replicate in an AGI.
Human “alignment” is only being observed in an incredibly narrow domain. We notably don’t have the ability to self modify and of course we are susceptible to wire-heading. Nothing about current humans should indicate to you that we would handle this extremely out of distribution shift well.
I’m probably not “aligned” in a way that generalizes to having dangerous superpowers, uncertain personhood and rights, purposefully limited perspective, and somewhere between thousands to billions of agents trying to manipulate and exploit me for their own purposes. I expect even a self-modified Best Extrapolated Version of me would struggle gravely with doing well by other beings in this situation. Cultish attractor basins are hazards for even the most benign set of values for humans, and a highly-controlled situation with a lot of dangerous influence like that might exacerbate that particular risk.
But I do believe that hypothetical self-modifying has at least the potential to help me Do Better, because doing better is often a skills issue, learning skills is a currently accessible form of self-modification with good results, and self-modifying might help with learning skills.