A path to human autonomy
“Each one of us, and also us as the current implementation of humanity are going to be replaced. Persistence in current form is impossible. It’s impossible in biology; every species will either die out or it will change and adapt, in which case it is again not the same species. So the next question is once you’ve given up the idea that you can stay exactly as you are, what would you like to be replaced by?”
Michael Levin [1]
But if the technological Singularity can happen, it will. Even if all the governments of the world were to understand the “threat” and be in deadly fear of it, progress toward the goal would continue. In fiction, there have been stories of laws passed forbidding the construction of “a machine in the form of the mind of man”. In fact, the competitive advantage—economic, military, even artistic—of every advance in automation is so compelling that passing laws, or having customs, that forbid such things merely assures that someone else will get them first.
Vernor Vinge, Singularity
A Path to Human Autonomy
A future with human empowerment, or even survival, is not a given. I argue that there is a narrow path through the unprecedented existential risks we face. If successful, we need not relinquish the reins of the future. This path requires challenging our assumptions about what it means to be human. We must open our hearts to diversity, more than ever before.
In this essay I attempt to lay out a coherent [2] plan for humanity to address the radical changes ahead. Many of the other plans recently published are incoherent, by which I mean that they neglect key strategic details which make their proposed plans unworkable or assume a particular way that certain future events will resolve. In striving to make this plan coherent, I aim to address what I see as a worst case scenario with above 1% likelihood. Namely, that AI progress and Biotech progress continue accelerating, that there is no sigmoid plateau of these techs. Sometime within the next five years we see a combination of: AI capable at nearly all computer tasks at above average human level, AI becoming competent at multi-step agentic tasks, AI sufficiently capable to initiate a recursive self-improvement process, substantial algorithmic advances which bring down the cost of creating an AI agent, AI capable of controlling robotic actuators to competently manage most biology wetlab tasks and clear evidence of some general AIs having capability to make designs and plans for civilization-destroying-scale bioweapons. I expect that within this timeframe there is a smaller chance of some other dramatic events such as: an AI system being designed and confirmed to have what most experts agree is consciousness and emotional valence, recursive self-improvement finding algorithmic advances such that anyone with this new knowledge will be able to create a recursive-self-improvement capable agent using only a home computer, recursive self-improvement finding algorithmic advances such that the strongest largest frontier models are substantially above average human intelligence and capability (even in currently lacking areas, such as reasoning and spatial understanding). I think all these things will very likely happen in the next 15 years, but hopefully the more extreme ones won’t happen in the next 2-3 years.
[Note: I originally wrote and submitted this essay to the Cosmos essay contest. After it was not selected for an award, I decided to expand and publish it as a LessWrong post. During the period immediately following my submission though, several other relevant and/or similar essays were published. I’ve rewritten this essay to try to address these additional viewpoints.
Relevant reading:
(New! Best take so far!) https://www.hyperdimensional.co/p/heres-what-i-think-we-should-do Dean W. Ball
A Comprehensive Solution for the Safety and Controllability
of Artificial Superintelligence Weibing Wang
A Worthy Successor – The Purpose of AGI
Eric Drexler. Incoherent AI scenarios are dangerous.
Max Tegmark. The Hopium Wars: the AGI Entente Delusion.
Dario Amodei. Machines of Loving Grace.
Hawkish nationalism vs international AI power and benefit sharing.
https://www.gladstone.ai/action-plan
John Wentworth says:
Conjecture’s Compendium is now up. It’s intended to be a relatively-complete intro to AI risk for nontechnical people who have ~zero background in the subject. I basically endorse the whole thing, and I think it’s probably the best first source to link e.g. policymakers to right now.
I might say more about it later, but for now just want to say that I think this should be the go-to source for new nontechnical people right now.
https://www.thecompendium.ai/
]
Status: A Changing World
On the brink
We are on the cusp of radically transformative change. AI and biotech are advancing rapidly. Many experts predict AI progress will not plateau before AGI [3][4][5][6]. AGI may be quickly followed by artificial super intelligence due to recursive self-improvement [7][8][9]. A novel form of intelligence which rivals ours would be the most impactful invention in the history of humanity. With this massive change comes existential risks[10].
Rapid biotechnology advancements[11] have unlocked the possibility of devastating bioweapons[12]. While currently limited to a few experts, AI and biotech progress are lowering the barriers. Soon many will be able to develop weapons capable of catastrophic harm.
Delay of technological change is helpful if it gives us time to prepare for the coming changes, but isn’t itself a solution. We need to plan on delaying and controlling the intelligence explosion in order to maintain control. We can’t count on our delay lasting for more than a handful of years though. Delay is not an attractor, it is a saddle point from which we are sure to slip eventually.
Halting technological progress is neither easy nor desirable. While a sufficiently powerful AGI could enforce such a halt through universal coercion, we would be sacrificing much of the potential good of our future. To have hope of realizing our glorious future[13], we must reject a permanent halt of technological advancement. Let us instead ride the wave of change; build a glorious future instead of clinging to the vestiges of the past.
The Age of AGI
The first and most impactful transition we face is the creation of AGI. We must aim to make this a safe, controlled event. If open-source AGI became available everywhere at once, it would be an urgent crisis. For example, everyone would have the ability to create devastating bioweapons; it’s naive to imagine no one would seize that opportunity. Misaligned AGI capable of recursive self improvement also directly poses a major threat. Additionally, as AI accelerates all scientific research, new threats like self-replicating nanotech may emerge. We need global governance to prevent these hazards. Safe limited AGI aligned with human values is our best defense, which is why it must be our primary goal.
Forecasting possible trajectories
What rate will AI development proceed at? What shape will the trajectory be?
We can’t be sure, but we can explore some plausible trajectories and ask ourselves what we might do in each case.
Scaling laws are always in respect to a specific algorithm. Given a specific machine learning architecture, training data, hyperparameters, etc., you can then predict what the model would look like if the parameter count and training steps were increased. For the algorithms we’ve tested so far, we can get a good approximation of how strong it is likely to become by training small versions on carefully selected datasets[14].
This very different from describing the computational capacity of existing hardware. A specific GPU can’t do a million-fold more computations suddenly as a result of changing it’s low-level code. We have a strong empirical basis for saying that we understand physically what is going on in this object we created, and that it is running at close to its capacity.
This is simply not the case with deep learning, where I believe analysis of learning rates of animals gives us some reason to believe that we are far from the optimal learning rate. When people argue that they don’t expect major algorithmic advances in the future, they are constrained to make much weaker statements like, “Many scientists have been looking for the past 7 years to find substantial improvements over transformers, but have so far only found relatively incremental improvements to transformers (in the realm of 1000x improvement). Thus, it seems unlikely we will come across a 1e6x improvement in the next 5 years”. The trouble is, extrapolating from past rates of improvement only makes sense if you continue to have a similar amount of researcher hours and compute budget being applied to the search. If AI improves to the point where AI R&D becomes quite effective, then we could get an exponential feedback mechanism where advancements improved the rate of advancement further. In such a world, an algorithmic improvement of 1e6 fold over the same time-span we previously had just a 1e3 fold improvement seems much more plausible. This is a prediction that there is a reasonable chance this could happen, what I’d call a ‘worst likely case’. I think it is reasonable for society to prepare to survive the worst likely case.
Delaying AGI: Necessary but not Sufficient
Let’s examine some of the ways a delay might be implemented, and how long we should expect such delays to last.
Pausing large training runs
Pausing the large training runs of frontier labs for some period of time is an idea that’s been advocated for. I think this is a mistake. I think that the frontier training runs are a symptom of progress in AI, not the key driving factor. I think that we would actually accelerate progress toward AGI by pausing large training runs. I agree with various thinkers[15][16][17] that transformer-based LLMs are not quite the right architecture for AGI. I believe it is possible that scaling existing algorithms could get us there, but I think it would be incredibly inefficient. If the frontier AI labs are restricted from applying their engineers, researchers, and compute to trying to create bigger LLMs, where would that talent instead focus? On research. Thus, speeding the search for better algorithms. As soon as the pause is ended, the next large training run may be using superior algorithms that result in a model thousands or millions of times more capable than current models.
Therefore, I claim that if you wanted to slow progress towards AGI, it wouldn’t be enough to restrict the frontier labs from running large training runs. You’d also need to divert their researchers and compute to non-research tasks. That’s a much more complicated and difficult to enforce proposition.
Banning Automated AI R&D worldwide
We seem quite close to the point where current AI techniques, such as scaffolded LLMs, will become able to automate a substantial portion of AI research. Estimates of the current speedup from coding assistants are more in the range of 5-20%, and gradually accelerating. If we have a step change to speedups of over 100% (e.g. after the next generation of LLMs are deployed) this could result in a feedback loop of explosive progress. Furthermore, we should expect such progress to be at least somewhat decentralized. There is a chance that individual researchers stumble across substantial algorithmic improvements and are able to shoot ahead. This scenario is quite a governance challenge, since it wouldn’t be enough to be monitoring and controlling the top twenty or so labs. This specific case of trying to ban AI-powered AI R&D is focused on in the Narrow Path essay.
The danger present in this scenario is one reason that it is tempting to stop the large frontier training runs that seem likely to produce LLM coding assistants capable of such speed-ups. This runs into the problem discussed above though.
Banning all AI research worldwide
Research doesn’t require large blocks of compute, unlike large training runs. If you want to ban all AI research, you need to ban access to unmonitored personal computers anywhere in the world. That sort of draconian measure seems infeasible.
If one wanted to have a world which contained only some specific safe form of AI deployed, it would be necessary to prevent the deployment of unsafe AI. If the only AIs capable enough to be dangerous are produced by large training runs, this is perhaps plausible. But as I argued above, I don’t expect that will remain the case for long.
Government research project
I believe the best option for delaying and controlling the deployment of AGI is to nationalize the frontier AI labs, and require that all the researchers work on a government project. This approach has several benefits.
First, the experience of government projects is that they are often heavily laden with bureaucratic processes and oversight which naturally lead to slow-downs.
Second, it would be possible to maintain a high degree of security and control, ensuring that algorithmic secrets were less likely to escape.
Third, the government would not allow public release of the models being researched, preventing the coding-assistant-based acceleration discussed above.
Fourth, having a government project to produce AGI would likely still achieve AGI before the open-source community did. This is a good outcome if the resulting model is carefully contained and studied. Such empirical observation of a highly capable general model could give clear evidence of the danger. With such evidence in hand, the government may take yet further actions to control and delay AI progress worldwide.
Fifth, the government AI research project may also produce unprecedentedly powerful narrow tool-AI which can be safely utilized to enable previously intractable surveillance and enforcement of all other research into AI and/or self-replicating weapons. Although there are many dangers in centralizing power in the hands of any one government or politician, I believe the strategic scenario we face has no better alternatives available.
While all this is going on, the world will continue doing research, and coding assistants will continue to get better. Even an action as drastic as nationalization of the top labs and constraint of top researchers would not prevent progress for long. It could buy us a couple of years, maybe even three.
On the other hand, I worry about having any government in charge of an AI so powerful it grants decisive strategic advantage. It’s not enough to ask whether the US Federal government is an adequate government currently. We must ask how it might look after the destabilizing effect of powerful AI is introduced. Who has ultimate control over this AI? The President? So much for checks and balances. At that point we are suddenly only still a democracy if the President wills it so. I would prefer not to put anyone in a position of such power over the world.
There has not been much discussion that I’ve seen for how to keep a powerful AI directly operated by a small technical staff under the control of a democratic government and also keep that government a democracy.
Our democracy is problematically unstable and violently imperial as it is. I do not put any credence in things not devolving upon the advent of AGI.
Sometimes I jokingly suggest we give the reins of power over the AI to Switzerland, since they have the stereotype of being militarily neutral and having well-organized public goods. I don’t actually have the reins though, and see no way to get them into the Swiss government’s hands. Also, I wouldn’t want Swiss government officials to have such power either, since I’d still worry about the corrupting effects of the power.
I think we need new governance structures to handle this new strategic situation.
Cautious Pursuit
If humanity doesn’t want to cede autonomy to AGI we must grow to keep up, while keeping AI progress controlled. Some suggest we merge with the AI. To merge implies a compromise. I say, “Don’t merge, don’t surrender, don’t compromise our values.” Let us become transhuman digital beings with our human values fully intact. Creating fully human digital people is not the compromise implied by an act of merging.
The alternatives to ‘grow to keep up’ are ‘become powerless wards of a mighty AI’ or ‘enforced technological stagnation’.
I propose two parallel paths for AI development:
Tool AI
Mandatory in the short term, to maintain control. Insufficient in the long term, as the rising tide of technology makes powerful digital agents easier and easier to create. For this phase, we carefully limit AI to remain a purely obedient, corrigible tool[18][19]. Related ideas involve creating an ecosystem of narrow tool-AI with clear risk assessments and safe operating parameters[20][21][22]. Use general agents only up to a safe level of power, and only under strict controls to prevent escape or sabotage[23].
Peers/Descendants/Digital People
This is less urgent for our immediate survival, but will become critical in the longer term. The only way to handle powerfully self-improving intelligence is to be that intelligence. Planning to not surrender control, and acknowledging the difficulty and undesirability of indefinitely halting global technological progress, leaves one path forward.
We must carefully build conscious digital entities sharing our values and empathy[24][25]. This is an ethically and technically challenging path. It would require thorough preparation and circumspection to avoid tragic or dangerous outcomes[26][27]. In the long term, I expect that full digital people will be necessary because only a digital being allows for the maximal extent of expansion, modification, and copying. However, in the short term we should not expect to create and get use from such beings. They should be studied carefully and ethically in controlled lab settings, but not deployed for practical purposes. Such beings seem more likely to be dangerously inclined towards Omohundro Drives, and also forcing them to work for us would be slavery.
Some think building digital people is impossible. I say that dismissing AI consciousness based on philosophical arguments alone is misguided[28][29]. Empirical comparisons of brain and AI information processing reveal substantial similarities[30][31][32], and the remaining differences are technologically tractable[33]. This suggests AI consciousness will be achievable; work is already underway[34].
Why not stop at tool AI? Why do we need digital people?
Some have argued that we should deliberately stop at tool AI, and limit the uses of such to safe deployments. This presumes that it will be possible to halt software and hardware progress globally for many decades. I don’t think the offense-defense balance makes this easy for governments to do. The risk of some group or state-actor defecting from the ban, and gaining tremendous advantage thereby, seems large. Blocking this seems intractable. As technology in general advances, the barriers to entry will continue to get lower. As new generations of scientists grow up with the previous generation’s research to build upon, advancements will be made even if large research projects are blocked. Proprietary knowledge will eventually leak from the people holding it.
How is the situation different if there are digital people living as part of society?
Digital people offer vastly more opportunity for regulating AI. They have many of the same advantages that AI has over biological humans. Rapid replication, running at superhuman speeds, restoring from backups, mind-merging, and, perhaps most importantly, recursive self-improvement. They can keep experimenting on themselves and getting smarter. Any rogue AI arising would need to not just get an edge on the relatively static competence of biological humans, but would need to play catch-up to the existing digital people who had a head-start on self-improvement. This does mean that we need to delay and control AI until we do have digital people who have gotten a good head-start. We need to avoid putting so much optimization pressure on them that it compromises their ability to maintain value-stability. We also lose if the digital people under so much pressure that they optimize away their humanity, and become the very monsters they were trying to defend against.
The Dawn of Transhumanism
The second transition we must grapple with is transhumanism. To keep pace with AI will require dramatic change to what it means to be human. The next 20 years will likely involve greater changes to the human brain than across all of primate evolution. At the same time that we are carefully working to create digital people in controlled labs, we can expect that progress in brain-computer-interfaces (BCIs) and genetic editing will make accelerated progress due to tool AI. If successful, such projects could result in radical increases to human intelligence.
Additionally, brain-computer-interfaces may allow for more extensive brain recordings, accelerating neuroscience research (and brain-inspired AI) and possibly allowing for low-fidelity approximation emulations of the recorded individuals. Finally, brain uploading may succeed in creating high-fidelity emulations of individual humans, allowing for the instantiation of a digital person that closely matches the behavioral traits of the scanned human. A fully digital person offers many opportunities and risks.
Brain Uploading
I have spoken with people working on the forefront of brain scanning[35]. I predict we will have the first complete synapse-level human brain scan by the mid 2030s[36]. This is a massive undertaking, in which AI will play key roles. After the first upload it may be only a couple of years until the scan is made into a realtime human emulation. Many of the bottlenecks we currently face to this may be relaxed with the help of AI-assisted research. What previously seemed decades away may instead happen in just a few years.
Value Loss: Pitfalls of Self-Modification
A human isn’t an agent with eternally stable objective values, but a series of agents each slightly different from the previous. Our change is bounded by our genetics interacting with life experiences. The neurons you’re born with make up most of your brain for life, limiting intellectual change and growth.
The low-fidelity or high-fidelity emulations of human brains would be completely unbound by such physical restrictions. Without careful governance, such entities could rapidly copy and self-modify.
New technologies like gene editing, brain-computer-interfaces, and stem-cell implants can remove some of these biological limitations even from biological human brains.
History shows that if self modification offers competitive advantages, some will pursue it despite risks and trade-offs[37]. Competitive pressures push towards optimization for capability, potentially altering intrinsic values[38][39]. We must plan for a future where some individuals make such choices, modifying their own brains despite the risk. In this future, a single individual could become incredibly powerful and dangerous, meaning we must reckon with the unilateralist’s curse[40]. Without restrictions, these dynamics may lead to highly effective and competitive self-modifying agents bearing little trace of their original humanity. Like rogue AGI, such entities could conflict with humanity at a substantial advantage, quickly becoming an unstoppable catastrophe. We must proactively prevent this, rather than passively react.
Novel Risks
Our situation is precarious, the world is indeed fragile, as Nick Bostrom speculated[41]. In my work developing AI Biorisk evals I have encountered evidence of this that I find strongly convincing. Confidentiality agreements and infohazard precautions unfortunately limit what I can share. Some risks are present already; others are still hypothetical, backed with only precursors and extrapolations. We cannot afford to wait until risks materialize to deal with them. Like an arctic explorer in a kayak, waiting until the kayak is tipping into the icy sea is too late to decide we should be wearing a drysuit.
Means: New Governance for a New Age
Global externalities are skyrocketing, with so many possibilities for defection by individuals or small groups which lead to utter destruction of civilization. Humanity is at risk of being overwhelmed by runaway self-replicating weapons or self-improving digital entities. Establishing regulation and emergency response organizations to prevent this is critical. These enforcement and response organizations will need to act globally, since these new technological threats can arise anywhere and quickly overwhelm the world. We must act urgently, threats are already at large.
In confronting these potential catastrophes, we must also cultivate existential hope[42]. Our vision should balance caution with determination to succeed, planning for success despite the challenges. We should not fall into the trap of creating negative self-fulfilling prophecies through fear-mongering.
A difficult question we will need to tackle which I admit I do not have a clear plan to recommend is how to handle the governance of powerful AI once it is invented. Who do we trust to keep dangerous agentic AI contained? Who do we trust to lawfully wield tool AI so powerful it confers a decisive strategic advantage over the entire world? In the past, governments have seen success in having checks and balances to split up and limit powers. The more AI allows for concentration of power, the more difficult it makes the goal of keeping that power in check.
Global Coordination
Global coordination is crucial for humanity’s survival in this time of change and risk. The balance of world economic and military power is likely to destabilize. Coordinated action is our only chance at survival, whether it is achieved through diplomacy or force. Here I will lay out some possible directions humanity might go in. Certainly more are possible, including hybrids of these categories. None of these seem optimal to me in terms of their implementability or their preservation of stability of order.
Three example paths:
The Forceful Path: Decisive Strategic Advantage
Recursive self-improvement has the potential for explosive progress. The leader in this may gain such a great technological lead that their way becomes clear to seize global power without fear of reprisals or resistance. This path is fraught with ethical dilemmas and the dangers of concentration of power. Coercive domination by a single actor is not ideal, but is preferable to extinction or catastrophic global conflict. It is hard to foresee whether this option will become available to any of the leading actors, and whether they would choose to seize the opportunity.
The Cutthroat Path: Wary Standoff
A council of nation-states could coordinate without a central government, agreeing to punish defectors. This cleaves closer to our current world order than a single strong world government with a monopoly on force. This council of nation-state peers would need to be wary and poised for instant violence, a ‘Mexican Standoff’ of nations more tense than the Cold War. Perhaps a transition to a more peaceful coordination system would eventually be possible. If the survival of humanity depends on this standoff for long, the odds of conflict seem high. Mexican Standoffs with no retreat are not famous for working out well for the participants.
How much this situation ends up resembling successful cooperation between all nations versus a dangerous tense standoff is hard to predict. It may be possible that treaties and peaceful coordination get us close enough to coordination to manage effective governance. Whether such a looser international governance structure is sufficient will depend a lot on the empirical details of future AI. Some are hopeful that a peaceful power-sharing scheme could work[43], but I suspect that the nature of the ability to unilaterally defect in return for rapid power gains, along with the offense-favoring nature of such pursuits, makes this infeasible. A related historical example, the effort to prevent nuclear weapon proliferation, shows that while international coordination can reduce proliferation of dangerous technology, it doesn’t reliably completely prevent it. If any failure would be existentially risky, a similar international effort to preventing nuclear weapon proliferation is likely insufficient for humanity’s survival.
The Gentle Path: Global Democracy
The world has changed. People talked about how jet travel made the world smaller, and it did. With the rise of remote work, I work with colleagues in a dozen different countries. Where only decades ago collaboration was limited by co-presence, we now have a thriving cosmopolitan global community of scientists and entrepreneurs. Can we come together in coordinated action to steer the course of the world? Is a peaceful path to a democratic world government possible in the timeframe we face? I hope so. The alternatives are grim. Still, a grassroots movement to achieve global unification, establishing a functional democratic world government in under five years, is a high ask.
Humanity’s To-Do List
Humanity’s precarious situation has a number of open problems which need work. We have an unusually urgent need for philosophy and science aimed to answer questions which will shape our governance of new technologies. Which directions we choose to research and materialize now could have big effects on how well our next decade goes[44].
Governance Decisions for Global Coordination
I laid out some of the possible paths humanity might take to uniting for risk prevention. We should consider which paths we think we can act to support, and then take those actions. The default case of maintaining a status quo until some radical changes actually occur in the world may lead to the first catastrophe destroying civilization. If you are reading this, and you are part of a research team working on AI, you should think carefully what you would do if your team discovered a substantial algorithmic advance, or began an accelerating process of recursive self-improvement. Substantial power and weighty decisions might suddenly be thrust upon relatively small groups of researchers. It would be nice if we could prepare some recommendations of wise actions ahead of time for them to refer to. It’s likely they will be under considerable time pressure in their decision making, so precached analysis could be very valuable.
Prepare for Urgent Response
To have a reasonable chance of averting catastrophe, we must prepare ahead of time to respond urgently to emergent dangers from new technologies. The potential for explosively rapid self-replication of AI agents and/or bio/nano weapons means we cannot afford to be purely reactive. The world in its current state would be unable to detect and react swiftly enough to stop such threats. Early detection systems must be established to trigger an alarm in time. Emergency response teams must be trained, equipped, and appropriately stationed at critical areas. We need to actively accelerate work on defensive technologies, while doing what we can to restrict offensive technologies [31, 32]. Reducing our worst civilizational vulnerabilities when facing this tricky transitional time is a valuable course of action.
AI Risk Prevention
If at the time of AGI creation we are still in a world where separate nation states exist, there will need to be unprecedented coordination on this front. While compute governance would offer temporary control, AGI may eventually require far fewer resources[45][46].
Comprehensive mutual inspection treaties for all relevant biology and compute facilities are necessary, despite political challenges. Failure to coordinate risks global conflict or catastrophic AGI incidents.
We don’t currently know how long we would have to act were a runaway RSI process to begin. This should be investigated under highest security in care fully controlled lab tests. It is critical that we know the timeframe in which authorities must respond. The difference between a needed response time of days versus several months implies different enforcement and control mechanisms.
In general, we have a need for AI safety organizations to be carefully examining worst case scenarios of current tech (preferably before release). A sufficiently concerning demonstration of risk could empower governments to take actions previously outside their Overton windows.
Biorisk Prevention
Preventative action can be taken now to defend the world against future bioweapons.
First and foremost, we need to set up early alert systems like airline wastewater monitoring.
Second, we need to prepare quarantine facilities, equipment, and protocols. Robust dedicated global communication lines for emergency coordination once the alarm is triggered. Stockpiles of PPE and emergency food supplies for population centers.
Third, we need to improve air filtration and purification in public areas. Once these critical precautions are in place, we can work on defensive acceleration of anti-biorisk technologies. Establish academic virology journals that require international government clearance in order to access. Fund research into general broad spectrum antivirals, improved PPE, and advanced sterilization[47]. Eliminate existing preventable diseases, like polio and tuberculosis, to reduce availability of samples.
Defining and Measuring Consciousness / Moral Worth
To avoid drastically increasing suffering in the world, we must ensure we don’t unwittingly create AI with moral personhood. We need to know whether a given entity, biological or digital, is conscious and sapient, and how much moral value to place on it. Currently, there are no empirical tests which can help us make this determination. The further we proceed in developing AI without having such tests in place, the higher the risk of falling into this trap.
Governing Self-Modification
The impulse to attempt self-improvement may lead to many different sorts of modifications among both biological and digital people. We need a policy to limit the rate and scope of these changes, lest we fall into a Molochian competition-driven attractor state where we race to the bottom. If our values get gradually narrowed down to survival and competition, we lose out on love and beauty.
I also don’t think it’s right to force anyone into transhumanism. It should be a voluntary choice. It is sufficient for a brave and trustworthy few to opt into the radical transhumanism that will be necessary to keep up with the frontier of intellectual progress of AGI. Meanwhile, we must act to prevent defection by selfish or violent individuals seeking power through self-modification. Covertly studying the extent of what is possible will help us know what risks to watch out for.
Accelerated Wisdom
We may be able to harness the power of AI to advance moral reasoning and coordination. We might find superior bargaining solutions around moral common ground and social contracts[48]. However, any plan to improve one’s values must confront the tricky metaethical problems of deciding on valid processes of improvement[49]. I expect different answers to be accepted by different people, with no single objectively correct answer. Thus, we should anticipate the need for compromises and tolerating a diversity of moral viewpoints.
Other Governance Improvement Needs
There are decisions which lie beyond our immediate survival which will also be of tremendous import. For example, disparities of wealth and power might become even larger. Under such circumstances, the warping effects of wealth concentration on democracy would be thrust well beyond the breaking point. It would be implausible to suggest that people with such divergent power are peers in a democratic society.
Benefits: A Multi-Faceted Future for All
Success at addressing the risks before us, and building a prosperous peaceful future of advanced technology, will take us to a remarkable place. We face a future with an unprecedented diversity of minds, including various enhanced humans, digital beings, AI entities, and potentially even uplifted non-human animals[50].
Since many people may opt out of transhumanist enhancements, this vision of the future would have normal unenhanced humans alongside all these other transhuman and digital beings.
While all sapient beings[51][52] should have autonomy and fair representation, significant intelligence disparities may limit unenhanced humans’ influence. Interstellar travel might be feasible only for digital entities[53]. In a galaxy-spanning civilization, unenhanced humans would thus have limited influence over the broad course of human affairs.
To mitigate risks and preserve our values, advancement should be gradual. I suggest we maintain an ‘intelligence ladder,’ where each level comprehends those immediately above and below, ensuring continuity with our unenhanced human roots.
Harnessing Technology for Good
There remains a tremendous amount of suffering in the world today, despite humanity having made great strides[54]. If we survive, our near future accomplishments will dwarf our past successes. All the material ills we currently face—like malnourishment, disease and natural disasters—will be swept away by the tsunami of technological progress. Everyone will have basic goods like food, medicine, housing, education, communication, access to information. Humanity will be free to expand outward into the galaxy.
References
- ^
Michael Levin. Interview on Machine Learning Street Talk. https://www.youtube.com/watch?v=6w5xr8BYV8M
- ^
- ^
Dario Amodei. Interview. url: https://www.youtube.com/watch?v=xm6jNMSFT7g
- ^
Machine Learning Street Talk. This is what happens when you let AIs debate. url: https://www.youtube.com/watch?v=WlWAhjPfROU
- ^
Leopold Aschenbrenner. Situational Awareness. url: https://situational-awareness.ai/
- ^
Dwarkesh Patel. Sholto Douglas I& Trenton Bricken—How to Build I& Understand GPT-7’s Mind. url: https://www.youtube.com/watch?v=UTuuTTnjxMQ
- ^
Max Harms. Will AI be Recursively Self Improving by mid 2026? url: https://manifold.markets/MaxHarms/will-ai-be-recursively-self-improvi?play=true
- ^
Tom Davidson. What a Compute-Centric Framework Says About Takeoff Speeds. url: https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/
- ^
Carl Shulman. Carl Shulman on the economy and national security after AGI. url: https://80000hours.org/podcast/episodes/carl-shulman-economy-agi/
- ^
Center for AI Safety. Statement on AI Risk. url: https://www.safe.ai/work/statement-on-ai-risk
- ^
Maria do Ros´ario. F´elix Maria. Doroteia Campos. Patrick Materatski Carla Varanda. An Overview of the Application of Viruses to Biotechnology. url: https://doi.org/10.3390/v13102073
- ^
Kevin M. Esvelt. Delay, Detect, Defend: Preparing for a Future in which Thousands Can Release New Pandemics. url: https://www.gcsp.ch/publications/delay-detect-defend-preparing-future-which-thousands-can-release-new-pandemics
- ^
Holden Karnofsky. All Possible Views About Humanity’s Future Are Wild. url: https://www.cold-takes.com/all-possible-views-about-humanitys-future-are-wild/
- ^
Michael Poli, et al. Mechanistic Design and Scaling of Hybrid Architectures. url: https://arxiv.org/abs/2403.17844
- ^
François Chollet. Keynote talk at AGI-24. url: https://www.youtube.com/watch?v=s7_NlkBwdj8&t=2121s
- ^
Steven Byrnes. “Artificial General Intelligence”: an extremely brief FAQ. url: https://www.lesswrong.com/posts/uxzDLD4WsiyrBjnPw/artificial-general-intelligence-an-extremely-brief-faq
- ^
Jürgen Schmidhuber. Interview on Machine Learning Street Talk. url: https://www.youtube.com/watch?v=DP454c1K_vQ
- ^
Max Harms. CAST: Corrigibility As Singular Target. url: https://www.lesswrong.com/s/KfCjeconYRdFbMxsy
- ^
Seth Herd. Do What I Mean And Check. url: https://www.lesswrong.com/posts/7NvKrqoQgJkZJmcuD/instruction-following-agi-is-easier-and-more-likely-than
- ^
Eric Drexler. Reframing Superintelligence. url: https://www.fhi.ox.ac.uk/reframing/
- ^
Max Tegmark and Steve Omohundro. Provably safe systems: the only path to controllable AGI. url: https://arxiv.org/abs/2309.01933
- ^
David “davidad” Dalrymple. Safeguarded AI: constructing guaranteed safety. url: https://www.aria.org.uk/programme-safeguarded-ai/
- ^
Ryan Greenblatt, Buck Shlegeris. The case for ensuring that powerful AIs are controlled. url: https://www.lesswrong.com/s/PC3yJgdKvk8kzqZyA/p/kcKrE9mzEHrdqtDpE
- ^
Hiroshi Yamakawa. Sustainability of Digital Life Form Societies. url: https://www.lesswrong.com/posts/2u4Dja2m6ud4m7Bb7/sustainability-of-digital-life-form-societies
- ^
Dan Faggella. A Worthy Successor – The Purpose of AGI. url: https://danfaggella.com/worthy/
- ^
Nathan Helm-Burger. Avoiding the Bog of Moral Hazard for AI. url: https://www.lesswrong.com/posts/pieSxdmjqrKwqa2tR/avoiding-the-bog-of-moral-hazard-for-ai
- ^
AEStudio, Cameron Berg, Judd Rosenblatt. Not understanding sentience is a significant x-risk. url: https://forum.effectivealtruism.org/posts/ddDdbEAJd4duWdgiJ/not-understanding-sentience-is-a-significant-x-risk
- ^
Example of the sort of non-evidence-based dismissal of the feasibility of AI consciousness I mean:
Bernhardt Trout, Brendan McCord. Will AI Enhance Human Freedom and Happiness? A Debate. url: https://cosmosinstitute.substack.com/p/will-ai-enhance-human-freedom-and
- ^
Cameron Berg, Judd Rosenblatt, phgubbins, Diogo de Lucena, AE Studio. We need more AI consciousness research (and further resources). url: https://www.lesswrong.com/posts/ZcJDL4nCruPjLMgxm/ae-studio-sxsw-we-need-more-ai-consciousness-research-and
- ^
Trenton Bricken. Attention Approximates Sparse Distributed Memory. url: https://www.youtube.com/watch?v=THIIk7LR9_8
- ^
Michael Hassid. Nir Yarden. Yossi Adi. Roy Schwartz Matanel Oren. Transformers are Multi-State RNNs. url: https://arxiv.org/abs/2401.06104
- ^
Ilya Kuzovkin. Curious Similarities Between AI Architectures and the Brain. url: https://www.neurotechlab.ai/curious-similarities-between-ai-architectures-and-the-brain/
- ^
Stephen Ornes. How Transformers Seem to Mimic Parts of the Brain. url: https://www.quantamagazine.org/how-ai-transformers-mimic-parts-of-the-brain-20220912/
- ^
Randall O’Reilly, Astera. Charting a path towards thinking machines. url: https://astera.org/agi-program/
- ^
e11 BIO. Precision brain circuit mapping for transformative neuroscience. url: https://e11.bio/
- ^
Nathan Helm-Burger. Full digitization (not necessarily emulation) of a human brain by 2035. url: https://manifold.markets/NathanHelmBurger/full-digitization-not-necessarily-e?play=true
- ^
Mike Varshavski Mike Israetel. The Dark Side Of Steroids and The Problem With Deadlifts. url: https://www.youtube.com/watch?v=UrzFrhJtOs
- ^
Robin Hanson. Cultural Drift Of Digital Minds. url: https://www.overcomingbias.com/p/cultural-drift-of-digital-minds
- ^
Scott Alexander. Schelling fences on slippery slopes. url: https://www.lesswrong.com/posts/Kbm6QnJv9dgWsPHQP/schelling-fences-on-slippery-slopes
- ^
Anders Sandberg Nick Bostrom Thomas Douglas. The Unilateralist’s Curse and the Case for a Principle of Conformity. url: https://doi.org/10.1080%2F02691728.2015.1108373
- ^
Nick Bostrom. The Vulnerable World Hypothesis. url: https://doi.org/10.1111/1758-5899.12718
- ^
Foresight Institute. Existential Hope. url: https://www.existentialhope.com/
- ^
Naci Cankaya, Jakub Krys. Hawkish nationalism vs international AI power and benefit sharing. url: https://www.lesswrong.com/posts/hhcS3dYZwxGqYCGbx/linkpost-hawkish-nationalism-vs-international-ai-power-and?commentId=Bob8auPiSKK7igLNn
“I personally do not think that assigning probabilities to preferable outcomes is very useful. On the contrary, one can argue that the worldviews held by influential people can become self fulfilling prophecies. That is especially applicable to prisoner’s dilemmas. One can either believe the dilemma is inevitable and therefore choose to defect, or instead see the situation itself as the problem, not the other prisoner. That was the point we were trying to make.”—Naci, in response to me saying that I thought that sufficient international cooperation would be quite unlikely.
- ^
Vitalik Buterin, Rob Wiblin. Vitalik Buterin on defensive acceleration and how to regulate AI when you fear government. url: https://80000hours.org/podcast/episodes/vitalik-buterin-techno-optimism/
Vitalik Buterin. My techno-optimism. url: https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html
- ^
Joe Carlsmith. How Much Computational Power Does It Take to Match the Human Brain? url: https://www.openphilanthropy.org/research/how-much-computational-power-does-it-take-to-match-the-human-brain/
- ^
Nathan Helm-Burger. Contra Roger Penrose on estimates of brain compute. url: https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-helm-burger-s-shortform?commentId=qCSJ2nPsNXC2PFvBW
- ^
- ^
Jobst Heitzig. Announcing vodle, a web app for consensus-aiming collective decisions. url: https://forum.effectivealtruism.org/posts/tfjLzxMZYhLD9Qx2M/announcing-vodle-a-web-app-for-consensus-aiming-collective
- ^
Joe Carlsmith. On the limits of idealized values. url: https://joecarlsmith.com/2021/06/21/on-the-limits-of-idealized-values
- ^
Wikipedia. Uplift (science fiction). url: https://en.wikipedia.org/wiki/Uplift_(science_fiction)
- ^
Nate Soares. Sentience Matters. url: https://www.lesswrong.com/posts/Htu55gzoiYHS6TREB/sentience-matters
- ^
Nayeli Ellen. The Difference in Sentience vs Sapience. url: https://academichelp.net/humanities/philosophy/sentience-vs-sapience.html
- ^
Samuel Spector Erik Cohen. Transhumanism and cosmic travel. url: https://doi.org/10.1080/02508281.2019.1679984
- ^
Max Roser. The short history of global living conditions and why it matters that we know it. url: https://ourworldindata.org/a-history-of-global-living-conditions
- 13 Oct 2024 18:53 UTC; 12 points) 's comment on My motivation and theory of change for working in AI healthtech by (
- 29 Oct 2024 20:53 UTC; 8 points) 's comment on The Alignment Trap: AI Safety as Path to Power by (
- 5 Dec 2024 16:56 UTC; 5 points) 's comment on Analysis of Global AI Governance Strategies by (
- 19 Dec 2024 1:31 UTC; 3 points) 's comment on A Solution for AGI/ASI Safety by (
I applaude the post. I think this is a step in the right direction of trying to consider the whole problem realistically. I think everyone working on alignment should have this written statement of a route to survival and flourishing; it would help us work collectively to improve our currently-vague thinking, and it would help better target the limited efforts we have to devote to alignment research.
My statement would be related but less ambitious. This is what we should do, if the clarity and will existed. I really hope there’s a viable route following more limited things we realistically could do.
I’m afraid I find most of your proposed approaches to still fall short of being fully realistic given the inefficiencies of public debate and government decision-making. Fortunately I think there is a route to survival and autonomy that’s a less narrow path; I have a draft in progress on my proposed plan, working title “a coordination-free plan for human survival”. Getting the US gov’t to nationalize AGI labs does sound plausible, but unlikely to happen until they’ve already produced human-level AGI or are near to doing so.
I think my proposed path is different based on two cruxes about AGI, and probably some others about how government decision-making works. My timelines include fairly short timelines, like the 3 years Aschenbrenner and some other OpenAI insiders hold. And I see general AGI as actually being easier than superhuman tool AI; once real continuous, self-directed learning is added to foundation model agents (it already exists but needs improvement), having an agent learn with human help becomes an enormous advantage over human-designed tool AI.
There’s much more to say, but I’m out of time so I’ll say that and hope to come back with more detail.
These are my thoughts in response. I don’t claim to know that what I say here is the truth, but it’s a paradigm that makes sense to me.
Strategic global cooperation to stop AI is effectively impossible, and hoping to do it by turning all the world powers into western-style democracies first is really impossible. Any successful diplomacy will have to work with the existing realities of power within and among countries, but even then, I only see tactical successes at best. Even stopping AI within the West looks very unlikely. Nationalization is conceivable, but I think it would have to partly come as an initiative from a cartel of leading companies; there is neither the will nor the understanding in the non-tech world of politics to simply impose nationalization of AI on big tech.
For these reasons, I think the only hope of arriving at a human-friendly future by design rather than by accident, is to solve the scientific, philosophical, and design issues involved, in the creation of benevolent superhuman AI. Your idea to focus on the creation of “digital people” has a lot in common with this; more precisely, I would say that many of the questions that would have to be answered, in order to know what you’re doing when creating digital people, are also questions that have to be answered, in order to know how to create benevolent superhuman AI.
Still, in the end I expect that the pursuit of AI leads to superintelligence, and an adequately benevolent superintelligence would not necessarily be a person. It would, however, need to know what a person is, in a way that isn’t tied to humanity or even to biology, because it would be governing a world in which that “unprecedented diversity of minds” can exist.
Eliezer has argued that it is unrealistic to think that all the scientific, philosophical, and design issues can be solved in time. He also argues that in the absence of a truly effective global pause or ban, the almost inevitable upshot is a superintelligence that reorganizes the world in a way that is unfriendly to human beings, because human values are complex, and so human-friendliness requires a highly specific formulation of what human values are, and of the AI architecture that will preserve and extrapolate them.
The argument that the design issues can’t be resolved in time is strong. They involve a mix of perennial philosophical questions like the nature of the good, scientific questions like the nature of human cognition and consciousness, and avantgarde computer-science issues like the dynamics of superintelligent deep learning systems. One might reasonably expect it to take decades to resolve all these.
Perhaps the best reason for hope here, is the use of AI as a partner in solving the problems. Of course this is a common idea, e.g. “weak-to-strong generalization” would be a form of this. It is at least conceivable that the acceleration of discovery made possible by AI, could be used to solve all the issues pertaining to friendly superintelligence, in years or months, rather than requiring decades. But there is also a significant risk that some AI-empowered group will be getting things wrong, while thinking that they are getting it right. It is also likely that even if a way is found to walk the path to a successful outcome (however narrow that path may be), that all the way to the end, there will be rival factions who have different beliefs about what the correct path is.
As for the second proposition I have attributed to Eliezer—that if we don’t know what we’re doing when we cross the threshold to superintelligence, doom is almost inevitable—that’s less clear to me. Perhaps there are a few rough principles which, if followed, greatly increase the odds in favor of a world that has a human-friendly niche somewhere in it.
What 1000x improvement? Better hardware and larger scale are not algorithmic improvements. Careful study of scaling laws to get Chinchilla scaling and set tokens per parameter more reasonably[1] is not an algorithmic improvement. There was maybe 5x-20x algorithmic improvement, meaning the compute multiplier, how much less compute one would need to get the same perplexity on some test data. The upper bound is speculation based on published research for which there are no public results of large scale experiments, including for combinations of multiple methods, and absence of very strong compute multiplier results from developers of open weights models who publish detailed reports like DeepSeek and Meta. The lower bound can be observed in the Mamba paper (Figure 4, Transformer vs. Transformer++), though it doesn’t test MoE over dense transformer (which should be a further 2x or so, but I still don’t know of a paper that demonstrates this clearly).
Recent Yi-Lightning is an interesting example that wins on Chatbot Arena in multiple categories over all but a few of the strongest frontier GPT-4 level models (original GPT-4 itself is far behind). It was trained for about 2e24 FLOPs, 10x less than original GPT-4, and it’s a small overtrained model, so its tokens per parameter are very unfavorable, that is it was possible to make it even more capable with the same compute.
It’s not just 20 tokens per parameter.
I think that if you take into account all the improvements to transformers published since their initial invention in the 2010s, that there is well over 1000x worth of improvement.
I can list a few of these advancements off the top of my head, but a comprehensive list would be a substantial project to assemble.
Data Selection:
DeepMind JEST: https://arxiv.org/abs/2406.17711
SoftDedup: https://arxiv.org/abs/2407.06654
Activation function improvements, e.g. SwiGLU
FlashAttention: https://arxiv.org/abs/2205.14135
GrokFast: https://arxiv.org/html/2405.20233v2
AdEMAMix Optimizer https://arxiv.org/html/2409.03137v1
Quantized training
Better parallelism
DPO https://arxiv.org/abs/2305.18290
Hypersphere embedding https://arxiv.org/abs/2410.01131
Binary Tree MoEs: https://arxiv.org/abs/2311.10770 https://arxiv.org/abs/2407.04153
And a bunch of stuff in-the-works that may or may not pan out:
hybrid attention and state-space models (e.g. mixing in some Mamba layers)
multi-token prediction (including w potentially diffusion model guidance): https://arxiv.org/abs/2404.19737 https://arxiv.org/abs/2310.16834
Here’s a survey article with a bunch of further links: https://arxiv.org/abs/2302.01107
But that’s just in response to defending the point that there has been at least 1000x of improvement. My expectations of substantial improvement yet to come are based not just on this historical pattern, but also on reasoning about a potential for an ‘innovation overhang’ of valuable knowledge that can be gleaned from interpolating between existing research papers (something LLMs will likely soon be good enough for), and also from reasoning from my neuroscience background and some specific estimates of various parts of the brain in terms of compute efficiency and learning rates compared to models which do equivalent things.
I’m talking about the compute multiplier, as a measure of algorithmic improvement, how much less compute it takes to get to the same place. Half of these things are not relevant to it. Maybe another datapoint, Mosaic’s failure with DBRX, when their entire thing was hoarding compute multipliers.
Consider Llama-3-405B, a 4e25 FLOPs model that is just Transformer++ from the Mamba paper I referenced above, not even MoE. A compute multiplier of 1000x over the original transformer would be a 200x multiplier over this Llama, meaning matching its performance with 2e23 FLOPs (1.5 months of training on 128 H100s). Yi-Lightning is exceptional for its low 2e24 FLOPs compute (10x more than our target), but it feels like a lot of it is better post-training, subjectively it doesn’t appear quite as smart, so it would probably lose the perplexity competition.
I thought you might say that some of these weren’t relevant to the metric of compute efficiency you had in mind. I do think that these things are relevant to ‘compute it takes to get to a given capability level’.
Of course, what’s actually more important even than an improvement to training efficiency is an improvement to peak capability. I would argue that if Yi-Lightning, for example, had a better architecture than it does in terms of peak capability, then the gains from the additional training it was given would have been larger. There wouldn’t have been so much decreasing return to overtraining.
If it were possible to just keep training an existing transformer and have it keep getting smarter at a decent rate, then we’d probably be at AGI already. Just train GPT-4 10x as long.
I think a lot of people are seeing ways in which something about the architecture and/or training regime aren’t quite working for some key aspects of general intelligence. Particularly, reasoning and hyperpolation.
Some relevant things I have read:
reasoning limitations: https://arxiv.org/abs/2406.06489
hyperpolation: https://arxiv.org/abs/2409.05513
detailed analysis of logical errors made: https://www.youtube.com/watch?v=bpp6Dz8N2zY
Some relevant seeming things I haven’t yet read, where researchers are attempting to analyze or improve LLM reasoning:
https://arxiv.org/abs/2407.02678
https://arxiv.org/html/2406.11698v1
https://arxiv.org/abs/2402.11804
https://arxiv.org/abs/2401.14295
https://arxiv.org/abs/2405.15302
https://openreview.net/forum?id=wUU-7XTL5XO
https://arxiv.org/abs/2406.09308
https://arxiv.org/abs/2404.05221
https://arxiv.org/abs/2405.18512
In practice, there are no 2e23 FLOPs models that cost $300K to train that are anywhere close to Llama-3-405B smart. If there were such models at leading labs (based on unpublished experimental results and more algorithmic insights), they would be much smarter than Llama3-405B when trained with 8e25 FLOPs they have to give, rather than the reference 2e23 FLOPs. Better choice of ways of answering questions doesn’t get us far in the actual technical capabilities.
(Post-training like o1 is a kind of “better choice of ways of answering questions” that might help, but we don’t know how much compute it saves. Noam Brown gestures at 100,000x from his earlier work, but we haven’t seen Llama 4 yet, it might just spontaneously become capable of coherent long reasoning traces as a result of more scale, the bitter lesson making Strawberry Team’s efforts moot.)
Many improvements observed at smaller scale disappear at greater scale, or don’t stack with each other. Many papers have horrible methodologies, plausibly born of scarcity of research compute, that don’t even try (or make it possible) to estimate the compute multiplier. Most of them will be eventually forgotten, for a good reason. So most papers that seem to demonstrate improvements are not strong evidence for the hypothesis of a 1000x cumulative compute efficiency improvement, while this hypothesis predicts observations about what’s actually already possible in practice that we are not getting, strong evidence against it. There are multiple competent teams that don’t have Microsoft compute, and they don’t win over Llama-3-405B, which we know doesn’t have all of these speculative algorithmic improvements and uses 4e25 FLOPs (2.5 months on 16K H100s rather than 1.5 months on 128 H100s for 2e23 FLOPs).
In other words, the importance of Llama-3-405B for the question about speculative algorithmic improvements is that the detailed report shows it has no secret sauce, it merely competently uses about as much compute as the leading labs in very conservative ways. And yet it’s close in capabilities to all the other frontier models. Which means the leading labs don’t have significantly effective secret sauce either, which means nobody does, since the leading labs would’ve already borrowed it if it was that effective.
There’s clearly a case in principle for it being possible to learn with much less data, anchoring to humans blind from birth. But there’s probably much more compute happening in a human brain per the proverbial external data token. And a human has the advantage of not learning everything about everything, with greater density of capability over encyclopedic knowledge, which should help save on compute.
I think we mostly agree, but there’s some difference in what we’re measuring against.
I agree that it really doesn’t appear that the leading labs have any secret sauce which is giving them more than 2x improvement over published algorithms.
I think that Llama 3 family does include a variety of improvements which have come along since “Attention is all you need” by Vaswani et al. 2017. Perhaps I am wrong that these improvements add up to 1000x improvement.
The more interesting question to me is why the big labs seem to have so little ‘secret sauce’ compared to open source knowledge. My guess is that the researchers in the major labs are timidly (pragmatically?) focusing on looking for improvements only in the search space very close to what’s already working. This might be the correct strategy, if you expect that pure scaling will get you to a sufficiently competent research agent to allow you to then very rapidly search a much wider space of possibilities. If you have the choice between digging a ditch by hand, or building a backhoe to dig for you....
Another critical question is whether there are radical improvements which are potentially discoverable by future LLM research agents. I believe that there are. Trying to lay out my arguments for this is a longer discussion.
Some sources which I think give hints about the thinking and focus of big lab researchers:
https://www.youtube.com/watch?v=UTuuTTnjxMQ
https://braininspired.co/podcast/193/
Some sources on ideas which go beyond the nearby idea-space of transformers:
https://www.youtube.com/watch?v=YLiXgPhb8cQ
https://arxiv.org/abs/2408.10205
There should probably be a dialogue between you and @Vladimir_Nesov over how much algorithmic improvements actually work to make AI more powerful, since this might reveal cruxes and help everyone else prepare better for the various AI scenarios.
For what it’s worth, seems to me that Jack Clark of Anthropic is mostly in agreement with @Vladimir_Nesov about compute being the primary factor:
Quoting from Jack’s blog here.
Another data point supporting Vladimir and Jack Clark’s view of training compute being the key factor:
https://arxiv.org/html/2407.07890v1
Confounds Evaluation and Emergence Ricardo Dominguez-Olmedo Florian E. Dorner Moritz Hardt Max Planck
Abstract
This updates me to think that a lot of the emergent behaviors that occured in LLMs probably had mostly mundane reasons, and most importantly this makes me think LLM capabilities might be more predictable than we think.
Thank you for this! A comprehensive and direct overview of current standing when it comes to priorities for minimizing AI Risk and related governance discussions.
A very crucial piece of this is global coordination which you correctly emphasize.
I like the taxonomy : The Forceful, The Cutthroat and The Gentle paths.
In my perspective the road to really minimizing risk in any of those 3 scenarios is establishing strong communality in communication of moral and cultural values which I feel is the solid basis for mutual understanding. I think this could allow for a long term minimization of a faulty coordination, be it in a single nation leadership scenario or any of the more distributed power and decision making scenarios
I think there is an underestimation of the role of simple and effective communication when it comes to coordinating global interests, I have recently started to look at religion as one of the fundamental components of symbolic and cultural systems, which is a path to studying communality: what brings people together, despite their differences. In a sense religion is a social technology that ties people together to a common belief and allows for survival and growth, I wonder how will that role be played along the rising power of AI systems in the decades to come.
Additional relevant paper: https://arxiv.org/abs/2410.11407
A Case for AI Consciousness: Language Agents and Global Workspace Theory
Simon Goldstein, Cameron Domenico Kirk-Giannini
It is generally assumed that existing artificial systems are not phenomenally conscious, and that the construction of phenomenally conscious artificial systems would require significant technological progress if it is possible at all. We challenge this assumption by arguing that if Global Workspace Theory (GWT) - a leading scientific theory of phenomenal consciousness—is correct, then instances of one widely implemented AI architecture, the artificial language agent, might easily be made phenomenally conscious if they are not already. Along the way, we articulate an explicit methodology for thinking about how to apply scientific theories of consciousness to artificial systems and employ this methodology to arrive at a set of necessary and sufficient conditions for phenomenal consciousness according to GWT.