My research focus is Alignment Target Analysis (ATA). I noticed that the most recently published version of CEV (Parliamentarian CEV, or PCEV) gives a large amount of extra influence to people that intrinsically value hurting other individuals. For Yudkowsky’s description of the issue you can search the CEV arbital page for ADDED 2023.
The fact that no one noticed this issue for over a decade shows that ATA is difficult. If PCEV had been successfully implemented, the outcome would have been massively worse than extinction. I think that this illustrates that scenarios where someone successfully hits a bad alignment target pose a serious risk. I also think that it illustrates that ATA can reduce these risks (noticing the issue reduced the probability of PCEV getting successfully implemented). The reason that more ATA is needed is that PCEV is not the only bad alignment target that might end up getting implemented. ATA is however very neglected. There does not exist a single research project dedicated to ATA. In other words: the reason that I am doing ATA is that it is a tractable and neglected way of reducing risks.
I am currently looking for collaborators. I am also looking for a grant or a position that would allow me to focus entirely on ATA for an extended period of time. Please don’t hesitate to get in touch if you are curious and would like to have a chat, or if you have any feedback, comments, or questions. You can for example PM me here, or PM me on the EA Forum, or email me at thomascederborgsemail@gmail.com (that really is my email address. It’s a Gavagai / Word and Object joke from my grad student days)
My background is physics as an undergrad and then AI research. Links to some papers: P1 P2 P3 P4 P5 P6 P7 P8. (no connection to any form of deep learning)
Let’s reason from the assumption that you are completely right. Specifically, let’s assume that every possible Sovereign AI Project (SAIP) would make things worse in expectation. And let’s assume that there exists a feasible Better Long Term Solution (BLTS).
In this scenario ATA would still only be a useful tool for reducing the probability of one subset of SAIPs (even if all SAIPs are bad some designers might be unresponsive to arguments, some flaws might not be realistically findable, etc). But it seems to me that ATA would be one complementary tool for reducing the overall probability of SAIP. And this tool would not be easy to replace with other methods. ATA could convince the designers of a specific SAIP that their particular project should be abandoned. If ATA results in the description of necessary features, then it might even help a (member of a) design team see that it would be bad if a secret project were to successfully hit a completely novel, unpublished, alignment target (for example along the lines of this necessary Membrane formalism feature).
ATA would also be a project where people can collaborate despite almost opposite viewpoints on the desirability of SAIP. Consider Bob who mostly just wants to get some SAIP implemented as fast as possible. But Bob still recognizes the unlikely possibility of dangerous alignment targets with hidden flaws (but he does not think that this risk is anywhere near large enough to justify waiting to launch a SAIP). You and Bob clearly have very different viewpoint regarding how the world should deal with AI. But there is actually nothing preventing you and Bob from cooperating on a risk reduction focused ATA project.
This type of diversity of perspectives might actually be very productive for such a project. You are not trying to build a bridge on a deadline. You are not trying to win an election. You do not have to be on the same page to get things done. You are trying to make novel conceptual progress, looking for a flaw of an unknown type.
Basically: reducing the probability of outcomes along the lines of the outcome implied by PCEV is useful according to a wide range of viewpoints regarding how the world should deal with AI. (there is nothing unusual about this general state of affairs. Consider for example Dave and Gregg who are on opposite sides of a vicious political trench war over the issue of pandemic lockdowns. There is nothing on the object level that prevents them from collaborating on a vaccine research effort. So this feature is certainly not unique. But I still wanted to highlight the fact that a risk mitigation focused ATA project does have this feature)