I agree this is a problem, and furthermore it’s a subcase of a more general problem: alignment to the user’s intent may frequently entail misalignment with humanity as a whole.
I admit, my views on this generally favor the first interpretation over the second interpretation in regards to what alignment goals to favor, and I generally don’t think that the second goal makes any sense in targeting it.
Well, admittedly “alignment to humanity as a whole” is open to interpretation. But would you rather everyone have their own personal superintelligence that they can brainwash to do whatever they want?
This is mostly because I think even a best case alignment scenario can’t be ever more than “everyone have their own personal superintelligence that they can brainwash to do whatever they want.”
This is related to fundamental disagreements I have around morality and values that make me pessimistic around trying to align groups of people, or indeed trying to align with the one true morality/values.
To state the disagreements I have:
I think to the extent that moral realism is right, morality/values is essentially trivial, in that every morality is correct, and I suspect that there is no non-arbitrary to restrict morality or values without sneaking in your own values.
Essentially, it’s trivialism, applied to morality, with a link below:
The reason reality doesn’t face the problem of being trivial is because for our purposes, we don’t have the power to warp reality to what you want to (Often talked about by different names, including omnipotentence, administrator access to reality, and more), whereas in morality, we do have the power to change our values to anything else, this generating inconsistent, but complete values, in contrast to the universe we find ourselves in, which is probably consistent and incomplete.
There is no way to coherently talk about something like a society or humanity’s values in the general case, and in the case where everyone is aligned, all we can talk about is optimal redistribution of goods.
This makes a lot of attempts to analogize society or humanity’s values to say, an individual person rely on two techniques that are subjective:
Carrying out a simplification or homogenization of the multiple preferences of the individuals that make up that society;
Modeling your own personal preferences as if these were the preferences of society as a whole.
That means it is never a nation or humanity that acts on morals or values, but specific people with their own values take those actions.
So my conclusion is, yes I do really bite the bullet here and support “everyone have their own personal superintelligence that they can brainwash to do whatever they want”.
This is an uncomfortable conclusion to come to, but I do suspect it will lead to better modeling of people’s values.
Final notes: I do want to point out a comment I made that seems relevant to this comment with slight modifications:
One important implication of the post relating to AI Alignment: It is impossible for AI to be aligned with society, conditional on the individuals not being all aligned with each other. Only in the N=1 case can guaranteed alignment be achieved.
In the pointers ontology, you can’t point to a real world thing that is a society, culture or group having preferences or values, unless all members have the same preferences.
And thus we need to be more modest in our alignment ambitions. Only AI aligned to individuals is at all feasibly possible. And that makes the technical alignment groups look way better.
It’s also the best retort to attempted collectivist cultures and societies.
We don’t have to solve any deep philosophical problems here finding the one true pointer to “society’s values”, or figuring out how to analogize society to an individual.
We just have to recognize that the vast majority of us really don’t want a single rogue to be able to destroy everything we’ve all built, and we can act pragmatically to make that less likely.
We don’t have to solve any deep philosophical problems here finding the one true pointer to “society’s values”, or figuring out how to analogize society to an individual.
I agree with this, in a nutshell. After all, you can put almost whatever values you like and it will work, which is the point of my long commennt.
My point is once you have the instrumental goals done like survival and technological progress down for everyone, alignment in practice should reduce to this:
Everyone have their own personal superintelligence that they can brainwash to do whatever they want.
And the alignment problem is simple enough: How do you brainwash an AI to have your goals?
to put it another way: if we don’t solve beings being misaligned with each other, AI kills humanity (and most of AI, too). we must solve the problem that prevents collectives and individuals from being aligned.
I agree this is a problem, and furthermore it’s a subcase of a more general problem: alignment to the user’s intent may frequently entail misalignment with humanity as a whole.
I admit, my views on this generally favor the first interpretation over the second interpretation in regards to what alignment goals to favor, and I generally don’t think that the second goal makes any sense in targeting it.
Well, admittedly “alignment to humanity as a whole” is open to interpretation. But would you rather everyone have their own personal superintelligence that they can brainwash to do whatever they want?
Basically, yes.
This is mostly because I think even a best case alignment scenario can’t be ever more than “everyone have their own personal superintelligence that they can brainwash to do whatever they want.”
This is related to fundamental disagreements I have around morality and values that make me pessimistic around trying to align groups of people, or indeed trying to align with the one true morality/values.
To state the disagreements I have:
I think to the extent that moral realism is right, morality/values is essentially trivial, in that every morality is correct, and I suspect that there is no non-arbitrary to restrict morality or values without sneaking in your own values.
Essentially, it’s trivialism, applied to morality, with a link below:
https://en.m.wikipedia.org/wiki/Trivialism
The reason reality doesn’t face the problem of being trivial is because for our purposes, we don’t have the power to warp reality to what you want to (Often talked about by different names, including omnipotentence, administrator access to reality, and more), whereas in morality, we do have the power to change our values to anything else, this generating inconsistent, but complete values, in contrast to the universe we find ourselves in, which is probably consistent and incomplete.
There is no way to coherently talk about something like a society or humanity’s values in the general case, and in the case where everyone is aligned, all we can talk about is optimal redistribution of goods.
This makes a lot of attempts to analogize society or humanity’s values to say, an individual person rely on two techniques that are subjective:
That means it is never a nation or humanity that acts on morals or values, but specific people with their own values take those actions.
Here’s a link to it.
https://www.lesswrong.com/posts/YYuB8w4nrfWmLzNob/thatcher-s-axiom
So my conclusion is, yes I do really bite the bullet here and support “everyone have their own personal superintelligence that they can brainwash to do whatever they want”.
This is an uncomfortable conclusion to come to, but I do suspect it will lead to better modeling of people’s values.
Final notes: I do want to point out a comment I made that seems relevant to this comment with slight modifications:
I think this is overcomplicating things.
We don’t have to solve any deep philosophical problems here finding the one true pointer to “society’s values”, or figuring out how to analogize society to an individual.
We just have to recognize that the vast majority of us really don’t want a single rogue to be able to destroy everything we’ve all built, and we can act pragmatically to make that less likely.
I agree with this, in a nutshell. After all, you can put almost whatever values you like and it will work, which is the point of my long commennt.
My point is once you have the instrumental goals done like survival and technological progress down for everyone, alignment in practice should reduce to this:
And the alignment problem is simple enough: How do you brainwash an AI to have your goals?
to put it another way: if we don’t solve beings being misaligned with each other, AI kills humanity (and most of AI, too). we must solve the problem that prevents collectives and individuals from being aligned.