My research focus is Alignment Target Analysis (ATA). I noticed that the most recently published version of CEV (Parliamentarian CEV, or PCEV) gives a large amount of extra influence to people that intrinsically value hurting other individuals. For Yudkowsky’s description of the issue you can search the CEV arbital page for ADDED 2023.
The fact that no one noticed this issue for over a decade shows that ATA is difficult. If PCEV had been successfully implemented, the outcome would have been massively worse than extinction. I think that this illustrates that scenarios where someone successfully hits a bad alignment target pose a serious risk. I also think that it illustrates that ATA can reduce these risks (noticing the issue reduced the probability of PCEV getting successfully implemented). The reason that more ATA is needed is that PCEV is not the only bad alignment target that might end up getting implemented. ATA is however very neglected. There does not exist a single research project dedicated to ATA. In other words: the reason that I am doing ATA is that it is a tractable and neglected way of reducing risks.
I am currently looking for collaborators. I am also looking for a grant or a position that would allow me to focus entirely on ATA for an extended period of time. Please don’t hesitate to get in touch if you are curious and would like to have a chat, or if you have any feedback, comments, or questions. You can for example PM me here, or PM me on the EA Forum, or email me at thomascederborgsemail@gmail.com (that really is my email address. It’s a Gavagai / Word and Object joke from my grad student days)
My background is physics as an undergrad and then AI research. Links to some papers: P1 P2 P3 P4 P5 P6 P7 P8. (no connection to any form of deep learning)
If I understand your comment correctly, then I think that the claim about preference variation made in the second sentence is wrong. Specifically, I think that a significant fraction of people do have strong preferences, such that fulfilling those preferences would be very bad for a significant fraction of people. Some people would for example strongly prefer that earths ecosystems be destroyed (as a way of preventing animal suffering). Others would strongly prefer to protect earths ecosystems. The partitioning idea that you mention would be irrelevant to this value difference. There also exists no level of resources that would make this disagreement go away. (This seems to me to be in contradiction with the claim that is introduced in the second sentence of your comment).
I don’t think that this is an unusual type of value difference. I think that there exists a wide range of value differences such that (i): fulfilling a strong preference of a significant fraction of people leads to a very bad outcome for another significant fraction of people, (ii): partitioning is irrelevant, and (iii): no amount of resources would help.
Let’s for example examine the primary value difference discussed in the Archipelago text that you link to in your earlier comment. Specifically the issue of what religion future generations will grow up in. And what religious taboos future generations will be allowed to violate. Consider a very strict religious community that view apostasy as a really bad thing. They want to have children, and they reject non-natural ways of creating or raising children. One possibility is an outcome where children are indoctrinated to ensure that essentially all children born in this community will be forced to adapt to their very strict rules (which is what would happen if the communities of a partitioned world have autonomy). This would be very bad for everyone who cares about the well-being of those children that would suffer under such an arrangement. To prevent such harm, one would have to interfere in a way that would be very bad for most members of this community (for example by banning them from having children. Or by forcing most members of this community to live with the constant fear of loosing a child to apostasy (in the Archipelago world described in your link, this would for example happen because of the mandatory UniGov lectures that all children must attend)).
We can also consider a currently discussed, near-term, and pragmatic, ethics question: what types of experiences are acceptable for what types of artificial minds? I see no reason to think that people will agree on which features of an artificial mind, implies that this mind must be treated well (or agree on what it means for different types of artificial minds to be treated well). This is partly an empirical issue, so some disagreements might be dissolved by better understanding. But it also has a value component. And there will presumably be a wide range of positions on the value component. Partitioning is again completely irrelevant to these value differences. (And additional resources actually makes the problem worse, not better).
Extrapolated humans will presumably have value differences along these lines about a wide range of issues that have not yet been considered.
There thus seems to exist many cases where variations in preference will in fact result in bad outcomes for a significant fraction of people (in the sense that many cases exists, where fulfilling a strong preference held by a significant fraction, would be very bad for a significant fraction of people).
More straightforwardly, but less exactly, one could summarise my main point as: for many people, many types of bad outcomes cannot be compensated for by giving someone the high-tech equivalent of a big pile of gold. There is no amount of gold that will make Bob ok with the destruction of earths ecosystems. And there is no amount of gold that will make Bill ok with animals suffering indefinitely in protected ecosystems. And the way things are partitioned would rarely be relevant for these types of conflicts. And if resources do happen to matter in some way, then additional resources would usually make things worse. Even more straightforwardly: you cannot compensate for one part of the universe being filled with factory farms by filling a different part of the universe with vegans.
In conclusion, it seems to me that there exists a wide range of strong preferences (each held by a significant fraction of people), such that fulfilling this preference would be very bad for a significant fraction of people. It seems to me that this falsifies the additional claim that you introduce in the second sentence of your comment.
(Some preference differences can indeed be solved with partitioning. So it is indeed possible for a given set of minds to have preference differences without having any conflicts. Which indeed means that the existence of preference differences does not, on its own, conclusively prove the existence of conflict. I am however only discussing the additional claim that you introduce in the second sentence of this comment. In other words: I am narrowly focusing on the novel claim that you seem to be introducing in the second sentence. I am just picking out this specific claim. And pointing out that it is false)
Technical note on scope:
My present comment is strictly limited to exploring the question of how one fraction of the population would feel, if a preference held by another fraction of the population were to be satisfied. I stay within these bounds, partly because any statement regarding what would happen, if an AI were to implement The Outcome that Humanity would like an AI to implement, would be a nonsense statement (because any such statement would implicitly assume that there is some form of free floating Group entity, out there, that has an existence that is separate from any specific mapping or set of definitions (see for example this post and this comment for more on this type of “independently-existing-G-entity” confusion)).
A tangent:
I think that from the perspective of most people, humans actively choosing an outcome would be worse than this outcome happening by accident. Humans deciding to destroy ecosystems will usually be seen as worse than ecosystems being destroyed by accident (similarly to how most people would see a world with lots of murders as worse than a world with lots of unpreventable accidents). The same goes for deciding to actively protect ecosystems despite this leading to animal suffering (similarly to how most people would see a world with lots of torture as worse than a world with lots of suffering from unpreventable diseases).