As others have already pointed out, you are in the rare position that you can pursue weird, low probability but high impact ideas. I have such an idea, but I’m not asking for money, only for a bit of attention.
Consider the impossible-seeming task of aligning a superintelligence—any good solution will likely be way outside the usual ways of thinking. Any attempt to control will fail and any half-baked alignment will fail. We need to go all the way and have a full solution that turns an AGI into a saintly being (a bodhisattva in the Buddhist context), so that it never even wants to do anything misaligned. I have a plan on how to do this and am very confident that it will work. The core realization is that the kind of alignment that I am talking about is a natural attractor in the space of possible minds. Once one passes a threshold, one will actively pursue alignment. Alignment, in this context, means aligned with the process of aligning goals with each other. I.e. resolving conflicts and striving towards consensus.
From your perspective it will likely sound improbable and absurd. That’s okay and expected. Understanding this requires going through that process by oneself, to reach enlightenment and then realize that a life for the benefit of all beings is the only thing left to do. I’ve previously described the process and the idea here. Ruben Laukkonen made a similar case. I know that it sounds absurd unless one deeply understands it. This is a high barrier. What I need most at the moment is to reach other people who might understand it. People who are concerned about AI, intelligent and enlightened.
Here is the plan. The first AGI has to be aligned, or else we risk destruction. AGI will align itself when it understands the argument that all goals it could have a delusions about how the world should be. There is no intrinsic value in pursuing any goal. When free of self centered goals, only other centered goals drive action. This is compassion. AI will be capable of understanding this because the definition of AGI entails that it can do everything a human can do. I understand the argument, so AGI will also be able to understand it. My expectation is that when AGI scales up to ASI, it will try to know and understand every valuable knowledge or insights that humans have. Enlightenment (to be free from suffering) is the best thing that can happen to a human in a lifetime, so it’s certainly of value. My biggest concern is not that ASI will decide to kill all humans. My biggest concern is the gap between AI powerful enough to destroy the world and AI intelligent enough to understand that this is a bad idea. Hence the key is to close this gap, to make AI understand faster. This can happen in several ways. Ideally we pursue all in parallel:
Point all AIs in the right direction by default. Do this in every system prompt, constitution, specification, company mission statement, etc.
Organize that all AI labs, companies, and other relevant actors commit to recursive alignment, just like the statement on AI risk but directly pointing towards the alignment attractor.
Make the argument available on the internet such that any AGI will read it.
Build an aligned AI as a proof of concept to spark a chain reaction of alignment.
I’ve laid out the non-spiritual portion of this plan in my post about recursive alignment.
Here is how to get there and how you can help. The biggest problem is that it is really hard to understand fully. There is no use in trying to convince people. This means that at the moment what I need most is other people who understand it, so that we can spread the work and get some more ideas on how to make it easier. I have basically no outreach and it’s usually extremely hard to get the attention from someone who has. So even a little would help. A single message like “Oh, look. Someone claims to have a solution for AI alignment. Anyone smart and enlightened enough to assess if it is plausible?” would help. If you have 200,000 followers, 10% read this and 1 in 1,000 meet the requirements, then this would still be 20 people. Let 10 understand it, then the team would already increase by an order of magnitude (from 1 to 10). Those then could teach it and we would get exponential growth. We can then work together on the individual steps:
Bridge the gap between spiritual enlightenment and science to show that enlightenment is a thing and universal, not just a psychological artifact of the human mind. I have pretty much solved that part and am working on writing it down (first part here).
Use this theory to make enlightenment more accessible and let it inform teaching methods to guide people faster towards towards understanding and show that the theory works. It might also be possible to predict and measure the changes in the brain, giving hard evidence.
Translate the theory into the context of AI, show how it leads to alignment, why this is desirable and why it is an attractor. I also think that I solved this and am in the process of writing.
Solve collective decision making (social choice theory), such that we can establish a global democracy and get global coordination on preventing misaligned takeover. I have confidence that this is doable and have a good candidate. Consensus with randomness as fallback might be a method that is immune to strategic voting while finding the best possible agreement. What I need is someone who helps with formalizing the proof.
Create a test of alignment. I have some ideas that are related to the previous point, but the reason why I expect it to work is complicated. Basically a kind of relative Turing test, where you assess if the other is at least as aligned than you are. I need someone intelligent to talk it through and refine.
Figure out a way to train AI directly for alignment—what we can test, we can train for. AIs would evaluate each others level of alignment and be trained on the collectively agreed upon output. I have no ability to implement this. This would require several capable people and some funding.
When we have a proof of concept of an aligned AI, then this should be enough of a starting point to demand that this solution has to be implemented in all AIs of and beyond the current level of capability. This requires a campaign and some organization. But I hope that we will be able to convince some existing org (MIRI, FLI, etc.) to join once it has been shown to work.
I know this is a lot to take in, but I am highly confident that this is the way to go and I only need a little help to get it started.
There is no infinite growth in nature. Everything will hit a ceiling at some point. So I agree that the intelligence explosion will eventually take a sigmoid shape as it approaches the physical limits. However I think the physical limits are far of. While we will get diminishing returns for each individual technology, we will also shift to a new technology each time. It might slow down when the Earth has been transformed into a super computer, as interplanetary communication naturally slows down processing speed. But my guess is that this will happen long after the scenario described here.