I mean, what do you think we’ve been doing all along?
So, the short answer is that I am actually just ignorant about this. I’m reading here to learn more but I certainly haven’t ingested a sufficient history of relevant works. I’m happy to prioritize any recommendations that others have found insightful or thought provoking, especially from the point of view of a novice.
I can answer the specific question “what do I think” in a bit more detail. The answer should be understood to represent the viewpoint of someone who is new to the discussion and has only been exposed to an algorithmically influenced, self-selected slice of the information.
I watched the Lex Fridman interview of Eliezer Yudkowsky and around 3:06 Lex asks about what advice Eliezer would give to young people. Eliezer’s initial answer is something to the extent of “Don’t expect a long future.” I interpreted Eliezer’s answer largely as trying to evoke a sense of reverence for the seriousness of the problem. When pushed on the question a bit further, Eliezer’s given answer is “…I hardly know how to fight myself at this point.” I interpreted this to mean that the space of possible actions that is being searched appears intractable from the perspective of a dedicated researcher. This, I believe, is largely the source of my question. Current approaches appear to be losing the race, so what other avenues are being explored?
I read the “Thomas Kwa’s MIRI research experience” discussion and there was a statement to the effect that MIRI does not want Nate’s mindset to be known to frontier AI labs. I interpreted this to mean that the most likely course being explored at MIRI is to build a good AI to preempt or stop a bad AI. This strikes me as plausible because my intuition is that the LLM architectures being employed are largely inefficient for developing AGI. However, the compute scaling seems to work well enough that it may win the race before other competing ideas come to fruition.
An example of an alternative approach that I read was “Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible” which seems like an avenue worth exploring, but well outside of my areas of expertise. The approach shares a characteristic with my inference of MIRI’s approach in that both appear to be pursuing highly technical avenues which would not scale meaningfully at this stage by adding helpers from the general public.
The forms of approaches that I expected to see but haven’t seen too much of thus far are those similar to the one that you linked about STOP AI. That is, approaches that would scale with the addition of approximately average people. I expected that this type of approach might take the form of disrupting model training by various means or coopting the organizations involved with an aim toward redirection or delay. My lack of exposure to such information supports a few competing models: (1) drastic actions aren’t being pursued at large scales, (2) actions are being pursued covertly, or (3) I am focusing my attention in the wrong places.
Our best chance at this point is probably government intervention to put the liability back on reckless AI labs for the risks they’re imposing on the rest of us, if not an outright moratorium on massive training runs.
Government action strikes me as a very reasonable approach for people estimating long time scales or relatively lower probabilities. However, it seems to be a less reasonable approach if time scales are short or probabilities are high. I presume that your high P(doom) already accounts for your estimation of the probability of government action being successful. Does your high P(doom) imply that you expect these to be too slow, or too ineffective? I interpret a high P(doom) as meaning that the current set of actions that you have thought of are unlikely to be successful and therefore additional action exploration is necessary. I would expect this would include the admission of ideas which would have previously been pruned because they come with negative consequences.
This is where most of my anticipated success paths lie as well.
I do not really understand how technical advance in alignment realistically becomes a success path. I anticipate that in order for improved alignment to be useful, it would need to be present in essentially all AI agents or it would need to be present in the most powerful AI agent such that the aligned agent could dominate other unaligned AI agents. I don’t expect uniformity of adoption and I don’t necessarily expect alignment to correlate with agent capability. By my estimation, this success path rests on the probability that the organization with the most capable AI agent is also specifically interested in ensuring alignment of that agent. I expect these goals to interfere with each other to some degree such that this confluence is unlikely. Are your expectations different?
I have not been thinking deeply in the direction of a superintelligent AGI having been achieved already. It certainly seems possible. It would invalidate most of the things I have thus far thought of as plausible mitigation measures.
Assuming a superintelligent AGI does not already exist, I would expect someone with a high P(doom) to be considering options of the form:
Use a smart but not self-improving AI agent to antagonize the world with the goal of making advanced societies believe that AGI is a bad idea and precipitating effective government actions. You could call this the Ozymandias approach.
Identify key resources involved in AI development and work to restrict those resources. For truly desperate individuals this might look like the Metcalf attack, but a tamer approach might be something more along the lines of investing in a grid operator and pushing to increase delivery fees to data centers.
I haven’t pursued these thoughts in any serious way because my estimation of the threat isn’t as high as yours. I think it is likely we are unintentionally heading toward the Ozymandias approach anyhow.