That is what I keep saying for years. To solve AI alignment with good results we need to first solve HUMAN alignment. Being able to align system to anyone’s values immediately brings the question of everyone else disagreeing with that someone. Unfortunately “whose exactly values we are trying to align AI to?” almost became taboo question that triggers a huge fraction of community and in best case scenario when someone even tries to answer it’s handwaved to “we just need to make sure AI doesn’t kill humanity”. Which is not a single bit better defined or implementable than Asimov’s laws. That’s just not how these things work. Edit: Also, as expected, someone already mentioned exactly this “answer” as what true solved alignment is...
The danger—actual, already real right now danger, not “possible in the future” danger, lies in people working with power-multiplying tools without understanding how they work and what is the area they are applicable for. Regardless what tool that is—you don’t need AGI to cause huge harm, already existing AI/ML systems more than enough.
“human alignment” doesn’t really make sense. humans have the values they do, there’s no objective moral good to which they “objcetively should” be “more aligned”.
any person should want it aligned to themself. i want it aligned to me, you want it aligned to you. we can probly expect it to be aligned to whatever engineer or engineers happens to be there when the aligned AI is launched.
which is fine, because they’re probly aligned enough with me or you (cosmopolitan values, CEV which values everyone’s values also getting CEV’d, etc). hopefully.
But that is exactly the point of the author of this post (which I agree with). AGI that can be aligned to literally anyone is more dangerous in the presence of bad actors than non-alignable AGI.
Also “any person should want it aligned to themself” doesn’t really matter unless “any person” can get access to AGI which would absolutely not be the case, at the very least in the beginning and probably—never.
That is what I keep saying for years. To solve AI alignment with good results we need to first solve HUMAN alignment. Being able to align system to anyone’s values immediately brings the question of everyone else disagreeing with that someone. Unfortunately “whose exactly values we are trying to align AI to?” almost became taboo question that triggers a huge fraction of community and in best case scenario when someone even tries to answer it’s handwaved to “we just need to make sure AI doesn’t kill humanity”. Which is not a single bit better defined or implementable than Asimov’s laws. That’s just not how these things work. Edit: Also, as expected, someone already mentioned exactly this “answer” as what true solved alignment is...
The danger—actual, already real right now danger, not “possible in the future” danger, lies in people working with power-multiplying tools without understanding how they work and what is the area they are applicable for. Regardless what tool that is—you don’t need AGI to cause huge harm, already existing AI/ML systems more than enough.
“human alignment” doesn’t really make sense. humans have the values they do, there’s no objective moral good to which they “objcetively should” be “more aligned”.
So when we align AI, who we align it TO?
any person should want it aligned to themself. i want it aligned to me, you want it aligned to you. we can probly expect it to be aligned to whatever engineer or engineers happens to be there when the aligned AI is launched.
which is fine, because they’re probly aligned enough with me or you (cosmopolitan values, CEV which values everyone’s values also getting CEV’d, etc). hopefully.
But that is exactly the point of the author of this post (which I agree with). AGI that can be aligned to literally anyone is more dangerous in the presence of bad actors than non-alignable AGI.
Also “any person should want it aligned to themself” doesn’t really matter unless “any person” can get access to AGI which would absolutely not be the case, at the very least in the beginning and probably—never.