Many thanks to Brandon Goldman, David Langer, Samuel Härgestam, Eric Ho, Diogo de Lucena, and Marc Carauleanu, for their support and feedback throughout.
Most alignment researchers we sampled in our recent survey think we are currently not on track to succeed with alignment–meaning that humanity may well be on track to lose control of our future.
In order to improve our chances of surviving and thriving, we should apply our most powerful coordination methods towards solving the alignment problem. We think that startups are an underappreciated part of humanity’s toolkit, and having more AI-safety-focused startups would increase the probability of solving alignment.
That said, we also appreciate that AI safety is highly complicated by nature[1] and therefore calls for a more nuanced approach than simple pro-startup boosterism. In the rest of this post, we’ll flesh out what we mean in more detail, hopefully address major objections, and then conclude with some pro-startup boosterism.
Expand the alignment ecosystem with startups
We applaud and appreciate current efforts to align AI. We could and should have many more. Founding more startups will develop human and organizational capital and unlock access to financial capital not currently available to alignment efforts.
The alignment ecosystem is limited on entrepreneurial thinking and behavior. The few entrepreneurs among us commiserate over this whenever we can.
We predict that many interested in alignment seem to do more to increase P(win) if they start thinking of themselves as problem-solvers specializing in a particular sub-problem first, deploying whatever approaches are appropriate in order to solve the smaller problem. Note this doesn’t preclude scaling ambitiously and solving bigger problems later on.[2]
Running a company that is targeting a particular niche of the giant problem seems like one of the best ways to go about this transition, unlocking a wealth of best practices that could be copied. For example, we’ve seen people in this space raise too little, too late, resulting in spending unnecessary time in the fundraising stage instead of doing work that advances alignment. We think this is often the result of not following a more standard playbook on how and when to raise, which could be done without compromising integrity and without being afraid to embrace the fact that they are doing a startup rather than a more traditional (non-profit) AI safety org.[3]
We think creating more safety-driven startups will both increase capital availability in the short-term (as more funding might be available for for-profit investments than non-profit donations) and in the long-term (as those companies succeed and have money to invest and create technically skilled and safety-motivated employees who have the resources to themselves be investors or donors for other projects). The creation of teams that have successfully completed projects together–organizational capital–will also better prepare the ecosystem to respond to new challenges as they arise. The organic structures formed by market systems allow for more dynamic and open allocation of people and resources to solve problems as they arise.
We also think that it is possible that alignment research will benefit from and perhaps even require significant resources that existing orgs may be too hesitant to spend. OpenAI, for example, never allocated the resources it promised to its safety team, and it has received pressure from corporate partners to be more risk-averse investing in R&D after Microsoft execs were disappointed by Arrakis. A model where investors speculatively fund those experiments and receive outsized rewards if they turn out to be useful will better incentivize the level of exploration that may be necessary for evaluating neglected approaches and ultimately solving alignment.
Many projects benefit from being structured as for-profits instead of as nonprofits. The feedback loops and incentive structures inherent to businesses can uniquely motivate and direct efforts. For-profit setups often demonstrate higher productivity and economic efficiency, driven by financial incentives that encourage rigor and innovation. This environment also fosters an ownership mentality, where accountability and dedication are prioritized. We expect that reliable and aligned systems will ultimately be most in demand,[4] and so contributions to the alignment problem should unlock significant economic benefits from utilization of AI—which in turn will accrue in part to people who contribute to those alignment contributions. In this sense, by channeling healthy self-interest into meaningful technical advances, for-profits can effectively advance alignment objectives while simultaneously achieving significant personal and societal impact.
If the whole problem is too big for you to know whether or not you’re making progress, consider working on a problem that’s smaller and using what your customers think about it as your feedback signal.[5] For example, we think a system where organizations that aren’t directly tackling alignment, but instead providing support to organizations that are, are organized as businesses that can survive off of revenues are more likely to lead to optimal allocations than situations where support organizations are spending other people’s money.[6]
Another way to put this is: if you’re considering applying to work at an AI lab to help with AI safety, consider instead trying to build the most ambitious company you can that will help with AI safety. However, to start, consider building an idea that seems more like a joke than the grand solution to everything, as many ideas develop with work and the humor might be pointing to something real. The biggest startups often start out looking like fun little experiments. If you think you have a shot at succeeding, we think you should give it a real try because the expected value for success can be very high. The expectation with startups for alignment should be: “most will fail, but the ones that succeed will be hugely impactful.”
Expanding now prepares well for the future
We suspect many billions in funding may go towards AI Alignment R&D[7] over the next decade. At minimum, Open Philanthropy itself is looking to double its Global Catastrophic Risks funding over the next few years. The more viable candidates there are for funding, the better the outcomes, and the more alignment-focused the candidates, the larger the alignment ecosystem—maximizing value for mitigating AI risks and solving the alignment problem itself. We should be skating to where the puck is going to be in terms of funding opportunities and technical capacities, and we think those who care most about alignment should be trying hard to meaningfully upskill and found startups to make the most of this potential near-future opportunity.
We also think that an important part of startup ecosystems is technically-minded investors who can act as angel investors or grant evaluators. If we need to move millions or billions in the future, we should have people practicing by moving thousands or millions today.
If it’s the case that solving alignment looks less like one brilliant breakthrough and more like a careful aggregation of many pieces that solve many different subproblems, then we need a structure that will create both the pieces and the careful aggregation. Historically, successful large engineering projects are more easily delivered through market mechanisms, and we should take advantage of that here.
Additionally, if it’s the case that a solution to the alignment problem requires an expensive underfunded ambitious effort, likemakinghumanssignificantlysmarter or doing whole brain emulation, startups seem especially well-suited to take on these moonshot efforts. Ideally, we’d preserve the best of what comes from startups for this sort of work while simultaneously pioneering some new incentive structures to support these unique endeavors, like purchase guarantees and windfall trusts.[8]
Differential tech development that doesn’t hurt on net is a broader category of work than some think
We think it’s important for founders and investors to anticipate their impacts on the world—and to worry about those impacts being possibly negative. We don’t think this counterbalances the pressing need to make progress on alignment research, and we should push people to choose projects and iterate more thoughtfully rather than simply doing less.
A common and reasonable objection is that founding projects can further push the capabilities of our systems beyond our ability to align or secure them. This can happen by directly contributing algorithmic improvements or enhancing research taste, increasing demand for AI or related services, or building organizations that safety-wash or (all-too-familiarly) get frog-boiled from alignment orgs to capabilities orgs. Similarly, another objection is that founding projects makes coordination harder by increasing the size and decreasing the uniformity of the field.
We suspect the horse has fled the barn on both counts.
Given that there are orders of magnitude more researchers working on advancing capabilities than working on alignment per se, we think potential incremental capabilities advances fostered in for-profit safety start-ups would be functionally negligible compared to the dizzying rate of current progress and investment in capabilities. We think adding alignment-focused participants might draw capital and talent that would otherwise be deployed at participants who would otherwise ignore alignment, and so the net effect will not obviously be an increase in capabilities progress.[9] While likely not relevant to existential questions, we also think ethically minded people can use AI in commercial contexts to further human flourishing, and this will help displace scammy uses for AI. We think that it’s possible to shift the culture to be more win-win and long-term focused, and that the best way to do this is by building ethical and functional products.
Additionally, it is worth noting our recent finding that over one hundred grant-funded alignment researchers generally disagree with the notion that alignment and capabilities work is mutually exclusive in general. Specifically related to our discussion here is that approximately 70% of surveyed researchers disagreed (somewhat or strongly) with the statement that ‘alignment research that has some probability of also advancing capabilities should not be done.’
On the multi-party coordination point, the field is large enough that we are no longer in a world where mutual understanding and agreement between a handful of foundational labs is sufficient to prevent catastrophe—and so if we need coordination to avoid the worst-case outcomes, we think the focus should be on strong government oversight and regulation. Safety-focused companies could develop the tooling and capacities necessary to successfully create that strong oversight and preserve it against industry opposition to AI safety. We think that industry opposition is more likely to appear and succeed if investors in companies that might cause human extinction are disproportionately those not tracking extinction risk as a meaningful contributor to their future wealth.[10]
Another worry related to alignment startups is that customer or product-focused organizations (for-profit or not!) will focus on developing alignment ideas that are useful with contemporary technology but don’t seem likely to scale with model capabilities. We think this is a serious worry, but again calls for nuance instead of a full halt. First, we think it makes sense to target improving the current background level of existing alignment plans instead of pointing directly to plans that are unilaterally good enough to solve alignment.[11] Second, we think that if we are worried that ideas will break down with scaling, it is likely possible to detect that breakdown[12] and use that as convincing evidence, rather than relying merely on theoretical arguments. Third, iterative development of ideas might manage to transmute ideas which do not scale into ideas which do.[13]
While we don’t think it’s likely, many hope that regulation is not necessary or that we will quickly meet the safety and security thresholds necessary for responsible progress. If so, having built a startup ecosystem still likely helped create that responsible progress.
Many people advocate for a pause on advancing the frontier of AI systems. In order for a pause to work as a pause, instead of a temporary reprieve or a prelude to stagnation, it needs to have an exit condition and allow for active progress towards that condition. We think that active progress will require a community able to pursue many different conflicting visions, discovering which paths are promising and which should be discarded. If we had a clear vision of how to build aligned AI systems, a centralized engineering project might work, but we’re not near that level of certainty yet, and so need to use our civilizational best practices for decision-making and resource allocation under uncertainty. We should be building today the projects necessary for safely exiting the pause as soon as is practical.
Broadly speaking, one of the cruxes here is whether it’s worth saying “yes and” to larval ideas in the hopes that they become good ideas through iterative development.[14] We think that as part of communicating the difficulty of alignment, many commentators have been too focused on explaining why ideas won’t work in a way that has reduced the amount of effort spent developing ideas. When ideas are unsalvageable, this strategy is good, because it allows effort to be redeployed to other, better ideas. But in our present situation, the absence of a clear agenda to funnel people towards suggests we’d be better off with a less judgemental yet still discerning approach that tries harder to invest in butterfly ideas. We should have extremely high standards while also being tolerant of wooly thinking, because it is one of the components of how ideas become fully baked.
That said, we want to invest in neglected approaches, not doomed ones, and hope to build an investment community that can avoid wasting resources on the same few bad ideas. One crux is whether or not a startup funding model will push founders to concentrate on the same fad ideas or spread out to cover more of the space; our optimism is driven in part by thinking this will lead to more people tackling neglected approaches, which might actually put us on track to solve alignment (something that alignment researchers currently do not seem to think will happen in the status quo).
We need to participate in and build the structures we want to see in the world
We think that as AI development and mainstream concern increase, there’s going to be a significant increase in safety-washing and incentives pushing the ecosystem from challenging necessary work towards pretending to solve problems. We think the way to win that conflict is by showing up, rather than lamenting other people’s incentives. This problem isn’t limited to business relationships; safety-washing is a known problem with nonprofits, government regulations, popular opinion, and so on. Every decision-maker is beholden to their stakeholders, and so decision quality is driven by stakeholder quality.
In order to raise money from investors to keep their startups alive, entrepreneurs will focus on what investors pay attention to. We do not want projects that are attempting to have major long-term impacts to be focused on quarterly profits rather than focused on technically grounded speculative bets. But this means we need investors whose time horizons and underlying projections line up with the time horizons of successfully completing ambitious projects that would resolve fundamental uncertainties. An investment community with that longer-term focus would lead to better decisions, both in this specific case and in aggregate, and so we’re trying to build it.
We would like to see more thinking along the lines of this post; both discussing this particular idea, and analysis of which structures will lead to the most success in tackling the acute risk period. Simultaneously, we also want to stress the need for finding solutions that don’t just ask people to join ongoing intellectual debates or be mindful about their impact, and instead point towards clear positive actions that could be taken. We would like for entrepreneurs to be heads-down focused on their projects that need that focus to survive and attempt to directly solve alignment, trusting that others will take care of meta-level questions and intellectual debates that don’t directly bear on their specific projects.
Practical next steps to solve alignment
Beyond simply arguing that this is a good idea, we want to put in the calories to catalyze more work that differentially advances AI safety.
We believe that people who want to do an AI safety-driven startup and think they’re up for it should either shoot for the most ambitious startup that they can that dramatically advances alignment (like whole brain emulation, brain-computer interfaces, etc.) and/or start small and experiment. We suggest that you just plunge right into doing a startup if you think you can.
It’s easy to get started by doing same-day-skunkworks-style hackathons—or any similar structure that enables fast execution and iteration. Competent product-focused people tend to be surprised at what they can hack out in just one focused day, especially with AI tools aiding development.
If you’re not ready yet, consider developing your skills by doing consulting work. Consulting lets you learn from other people’s mistakes, be directly accountable to users, improve your people skills in ways you didn’t even realize you needed to, grow professionally and emotionally in ways that will make you do better as a startup founder, and so on.
If you think you’re above a high bar technically, we invite you to apply to do (mostly-not-alignment-yet) consulting work with us. We hope this will be a good route to one day launch an alignment-driven startup from within AE’s skunkworks.
We envision scaling our consulting business to primarily do alignment work in the future, perhaps as that becomes more dominant in demand and all the more necessary. We intend to prove this out and then scale it up, hopefully convincing other orgs to copy us, substantially helping solve the talent gap in AI safety work today, and helping to create better orgs to better make use of the huge amount of money we expect to go into alignment in the future.
We believe it’s crucial to foster both supply and demand in the AI safety sector. Interestingly, we have significant exposure to AI-safety-related startup deal flow. If you’re an accredited investor who prioritizes safety and is interested in learning about these opportunities, we invite you to reach out here.
If you’ve already founded an AI safety-driven startup, here is a short list of some investors interested in alignment (this doc is currently publicly editable and anyone can add more to this list).
We also encourage you to apply for funding in our upcoming competition with $50K seed funding for already-existing safety-focused businesses and/or anyone who has promising business ideas that first and foremost advance alignment, to be evaluated by AI safety experts and concerned business leaders.
The future is likely to get really weird, really fast. New tech and capabilities will accelerate what is possible. We can imagine leveraging new generations of AI to create unicorn companies in a few months, with as little as one employee. Considering that startups are the best suited vehicle for taking advantage of new tech and disrupting industries, differentially the more marginal alignment of each startup, the better for alignment.
We think that alignment may be solvable, and that humanity can win. Progress will become bottlenecked by alignment science, and that AI safety-driven startups may be key to solving this, so let’s make sure they actually exist.
Throughout, it is important to keep in mind that the dual-use nature of developments calls for differential tech development, the potentially contagious nature of failures means local experimentation is riskier than it is in normal engineering contexts, and the nature of cognition means that challenges that look continuous might actually have sharp left turns that cause previously functional alignment techniques to no longer work.
Part of this is hope management / burnout avoidance. If burnout is the growing realization that your efforts aren’t visibly helping, and are being done against growing resistance caused by that realization, then the way to avoid burnout is to switch to areas that are more visibly helping; under this strategy, by reducing the scope of your initial focus and ambitions. We think attempts to iteratively develop partial solutions might successfully aggregate into a whole solution and people should explicitly switch focus to this instead of giving up or burning out. Additionally, since the secret to making your startup successful is just knowing you will be terribly demoralized at many times but knowing that no matter what you’ll just “not giving up” there will be some additional force buffeting you through the vicissitudes of alignment mental health issues. The EA community is quick to notice map/territory confusions in this class of startup advice, and we think it’s easy to take the wrong lessons here. We think you should be deliberately optimistic about your persistence and your ability to get up tomorrow and have different ideas as a reason to think that your company and career will succeed, without being attached to the thing that you are trying today.
Separately, having a vibrant startup ecosystem will attract mercenaries. When you can legibly check on whether they’re delivering on their specific promises, mercenaries help, and part of the overall transition we’re suggesting is moving from a generic “are they on the same side?” movement mentality to a “are they doing a better job at their role than their competitors would?” market mentality.
Most of VCs’ returns come from a small percent of their investments. These unicorns that succeed maximally tend not to be companies that do evil things but rather make things users want—and grow organically because they genuinely provide value to users. Seeking to build healthy companies like that requires long term thinking that is also needed to build responsible AI innovations to mitigate AI risk.
Doing this adds a layer of indirection between you and your true goal; rather than being grounded in reality, you’re grounded in your society’s perception of reality. This doesn’t make progress impossible, just more difficult, and often this tradeoff is worthwhile.
To be clear, we are hoping that organizations working directly on alignment will be well-resourced in this scenario, and think there can be situations where it’s worthwhile for philanthropists or governments to subsidize public goods.
At present, it is already difficult to determine which category investments fall into (particularly from the government), and the line between them may become more blurred as time goes on. We predict this will be true even with a somewhat strict definition of ‘alignment R&D’, and considering both capabilities and alignment categories separately.
Windfall trusts would incentivize anyone working on AI safety startups to do individually-likely-to-fail-but-high-impact-if-they-work startups with a tiny amount of equity in other participating companies, so that if any of the startups succeed any startup founder may reap more money than they could ever possibly want with just a tiny bit of equity in that company in post-human economy that rapidly grows much larger ours.
Many people are interested in supporting AI safety, but have technical skills that are useful at a broad range of companies instead of the research skills suitable for independent research in this field. The more safety-focused companies there are, the more those people will be able to work at them instead of at capabilities companies. This also goes for mercenaries who simply find specific safety-focused companies as a better fit, or want to appeal to safety-focused investors, or who view safety as the core limiter to profitable businesses.
One of us (Vaniver) and his husband hold OpenAI units as well as investments in other AI-related startups. He would be happy to set those investments on fire if it meant still being alive in a few decades, and having more investors with priorities like that will hopefully allow companies to make decisions that are better than humanity, while still trying to obtain the potential upsides of responsible development.
While it would be great to come across plans that solve alignment directly, we think we’ve spent a decade looking for them without success, and it consequently makes sense to focus on more iterative plans or partial progress. “Buy dignity”, as Eliezer might say.
The possibility of fast takeoff makes this detection a trickier business than it would be otherwise, but we don’t think purely hypothetical discussion of the risk has succeeded at convincing as many people as it needs to.
Many problems with alignment ideas are patch resistant. It is still the normal course of intellectual development that ideas take time to cook, and that broken ideas are discovered before their functional neighbors. We think the right solution here is to notice the difficulty and put in the work.
We think this approach is important to take on the meta level as well. Maybe people have ideas that seem incorrect to us about how to proceed with the acute risk period, from full-speed-ahead accelerationism to blanket luddism. We think progress looks like bridging between groups, identifying shards of truth wherever they can be found, and iterating towards wise and effective strategies and positions.
There Should Be More Alignment-Driven Startups
Many thanks to Brandon Goldman, David Langer, Samuel Härgestam, Eric Ho, Diogo de Lucena, and Marc Carauleanu, for their support and feedback throughout.
Most alignment researchers we sampled in our recent survey think we are currently not on track to succeed with alignment–meaning that humanity may well be on track to lose control of our future.
In order to improve our chances of surviving and thriving, we should apply our most powerful coordination methods towards solving the alignment problem. We think that startups are an underappreciated part of humanity’s toolkit, and having more AI-safety-focused startups would increase the probability of solving alignment.
That said, we also appreciate that AI safety is highly complicated by nature[1] and therefore calls for a more nuanced approach than simple pro-startup boosterism. In the rest of this post, we’ll flesh out what we mean in more detail, hopefully address major objections, and then conclude with some pro-startup boosterism.
Expand the alignment ecosystem with startups
We applaud and appreciate current efforts to align AI. We could and should have many more. Founding more startups will develop human and organizational capital and unlock access to financial capital not currently available to alignment efforts.
The alignment ecosystem is limited on entrepreneurial thinking and behavior. The few entrepreneurs among us commiserate over this whenever we can.
We predict that many interested in alignment seem to do more to increase P(win) if they start thinking of themselves as problem-solvers specializing in a particular sub-problem first, deploying whatever approaches are appropriate in order to solve the smaller problem. Note this doesn’t preclude scaling ambitiously and solving bigger problems later on.[2]
Running a company that is targeting a particular niche of the giant problem seems like one of the best ways to go about this transition, unlocking a wealth of best practices that could be copied. For example, we’ve seen people in this space raise too little, too late, resulting in spending unnecessary time in the fundraising stage instead of doing work that advances alignment. We think this is often the result of not following a more standard playbook on how and when to raise, which could be done without compromising integrity and without being afraid to embrace the fact that they are doing a startup rather than a more traditional (non-profit) AI safety org.[3]
We think creating more safety-driven startups will both increase capital availability in the short-term (as more funding might be available for for-profit investments than non-profit donations) and in the long-term (as those companies succeed and have money to invest and create technically skilled and safety-motivated employees who have the resources to themselves be investors or donors for other projects). The creation of teams that have successfully completed projects together–organizational capital–will also better prepare the ecosystem to respond to new challenges as they arise. The organic structures formed by market systems allow for more dynamic and open allocation of people and resources to solve problems as they arise.
We also think that it is possible that alignment research will benefit from and perhaps even require significant resources that existing orgs may be too hesitant to spend. OpenAI, for example, never allocated the resources it promised to its safety team, and it has received pressure from corporate partners to be more risk-averse investing in R&D after Microsoft execs were disappointed by Arrakis. A model where investors speculatively fund those experiments and receive outsized rewards if they turn out to be useful will better incentivize the level of exploration that may be necessary for evaluating neglected approaches and ultimately solving alignment.
Many projects benefit from being structured as for-profits instead of as nonprofits. The feedback loops and incentive structures inherent to businesses can uniquely motivate and direct efforts. For-profit setups often demonstrate higher productivity and economic efficiency, driven by financial incentives that encourage rigor and innovation. This environment also fosters an ownership mentality, where accountability and dedication are prioritized. We expect that reliable and aligned systems will ultimately be most in demand,[4] and so contributions to the alignment problem should unlock significant economic benefits from utilization of AI—which in turn will accrue in part to people who contribute to those alignment contributions. In this sense, by channeling healthy self-interest into meaningful technical advances, for-profits can effectively advance alignment objectives while simultaneously achieving significant personal and societal impact.
If the whole problem is too big for you to know whether or not you’re making progress, consider working on a problem that’s smaller and using what your customers think about it as your feedback signal.[5] For example, we think a system where organizations that aren’t directly tackling alignment, but instead providing support to organizations that are, are organized as businesses that can survive off of revenues are more likely to lead to optimal allocations than situations where support organizations are spending other people’s money.[6]
Another way to put this is: if you’re considering applying to work at an AI lab to help with AI safety, consider instead trying to build the most ambitious company you can that will help with AI safety. However, to start, consider building an idea that seems more like a joke than the grand solution to everything, as many ideas develop with work and the humor might be pointing to something real. The biggest startups often start out looking like fun little experiments. If you think you have a shot at succeeding, we think you should give it a real try because the expected value for success can be very high. The expectation with startups for alignment should be: “most will fail, but the ones that succeed will be hugely impactful.”
Expanding now prepares well for the future
We suspect many billions in funding may go towards AI Alignment R&D[7] over the next decade. At minimum, Open Philanthropy itself is looking to double its Global Catastrophic Risks funding over the next few years. The more viable candidates there are for funding, the better the outcomes, and the more alignment-focused the candidates, the larger the alignment ecosystem—maximizing value for mitigating AI risks and solving the alignment problem itself. We should be skating to where the puck is going to be in terms of funding opportunities and technical capacities, and we think those who care most about alignment should be trying hard to meaningfully upskill and found startups to make the most of this potential near-future opportunity.
We also think that an important part of startup ecosystems is technically-minded investors who can act as angel investors or grant evaluators. If we need to move millions or billions in the future, we should have people practicing by moving thousands or millions today.
If it’s the case that solving alignment looks less like one brilliant breakthrough and more like a careful aggregation of many pieces that solve many different subproblems, then we need a structure that will create both the pieces and the careful aggregation. Historically, successful large engineering projects are more easily delivered through market mechanisms, and we should take advantage of that here.
Additionally, if it’s the case that a solution to the alignment problem requires an expensive underfunded ambitious effort, like making humans significantly smarter or doing whole brain emulation, startups seem especially well-suited to take on these moonshot efforts. Ideally, we’d preserve the best of what comes from startups for this sort of work while simultaneously pioneering some new incentive structures to support these unique endeavors, like purchase guarantees and windfall trusts.[8]
Differential tech development that doesn’t hurt on net is a broader category of work than some think
We think it’s important for founders and investors to anticipate their impacts on the world—and to worry about those impacts being possibly negative. We don’t think this counterbalances the pressing need to make progress on alignment research, and we should push people to choose projects and iterate more thoughtfully rather than simply doing less.
A common and reasonable objection is that founding projects can further push the capabilities of our systems beyond our ability to align or secure them. This can happen by directly contributing algorithmic improvements or enhancing research taste, increasing demand for AI or related services, or building organizations that safety-wash or (all-too-familiarly) get frog-boiled from alignment orgs to capabilities orgs. Similarly, another objection is that founding projects makes coordination harder by increasing the size and decreasing the uniformity of the field.
We suspect the horse has fled the barn on both counts.
Given that there are orders of magnitude more researchers working on advancing capabilities than working on alignment per se, we think potential incremental capabilities advances fostered in for-profit safety start-ups would be functionally negligible compared to the dizzying rate of current progress and investment in capabilities. We think adding alignment-focused participants might draw capital and talent that would otherwise be deployed at participants who would otherwise ignore alignment, and so the net effect will not obviously be an increase in capabilities progress.[9] While likely not relevant to existential questions, we also think ethically minded people can use AI in commercial contexts to further human flourishing, and this will help displace scammy uses for AI. We think that it’s possible to shift the culture to be more win-win and long-term focused, and that the best way to do this is by building ethical and functional products.
Additionally, it is worth noting our recent finding that over one hundred grant-funded alignment researchers generally disagree with the notion that alignment and capabilities work is mutually exclusive in general. Specifically related to our discussion here is that approximately 70% of surveyed researchers disagreed (somewhat or strongly) with the statement that ‘alignment research that has some probability of also advancing capabilities should not be done.’
On the multi-party coordination point, the field is large enough that we are no longer in a world where mutual understanding and agreement between a handful of foundational labs is sufficient to prevent catastrophe—and so if we need coordination to avoid the worst-case outcomes, we think the focus should be on strong government oversight and regulation. Safety-focused companies could develop the tooling and capacities necessary to successfully create that strong oversight and preserve it against industry opposition to AI safety. We think that industry opposition is more likely to appear and succeed if investors in companies that might cause human extinction are disproportionately those not tracking extinction risk as a meaningful contributor to their future wealth.[10]
Another worry related to alignment startups is that customer or product-focused organizations (for-profit or not!) will focus on developing alignment ideas that are useful with contemporary technology but don’t seem likely to scale with model capabilities. We think this is a serious worry, but again calls for nuance instead of a full halt. First, we think it makes sense to target improving the current background level of existing alignment plans instead of pointing directly to plans that are unilaterally good enough to solve alignment.[11] Second, we think that if we are worried that ideas will break down with scaling, it is likely possible to detect that breakdown[12] and use that as convincing evidence, rather than relying merely on theoretical arguments. Third, iterative development of ideas might manage to transmute ideas which do not scale into ideas which do.[13]
While we don’t think it’s likely, many hope that regulation is not necessary or that we will quickly meet the safety and security thresholds necessary for responsible progress. If so, having built a startup ecosystem still likely helped create that responsible progress.
Many people advocate for a pause on advancing the frontier of AI systems. In order for a pause to work as a pause, instead of a temporary reprieve or a prelude to stagnation, it needs to have an exit condition and allow for active progress towards that condition. We think that active progress will require a community able to pursue many different conflicting visions, discovering which paths are promising and which should be discarded. If we had a clear vision of how to build aligned AI systems, a centralized engineering project might work, but we’re not near that level of certainty yet, and so need to use our civilizational best practices for decision-making and resource allocation under uncertainty. We should be building today the projects necessary for safely exiting the pause as soon as is practical.
Broadly speaking, one of the cruxes here is whether it’s worth saying “yes and” to larval ideas in the hopes that they become good ideas through iterative development.[14] We think that as part of communicating the difficulty of alignment, many commentators have been too focused on explaining why ideas won’t work in a way that has reduced the amount of effort spent developing ideas. When ideas are unsalvageable, this strategy is good, because it allows effort to be redeployed to other, better ideas. But in our present situation, the absence of a clear agenda to funnel people towards suggests we’d be better off with a less judgemental yet still discerning approach that tries harder to invest in butterfly ideas. We should have extremely high standards while also being tolerant of wooly thinking, because it is one of the components of how ideas become fully baked.
That said, we want to invest in neglected approaches, not doomed ones, and hope to build an investment community that can avoid wasting resources on the same few bad ideas. One crux is whether or not a startup funding model will push founders to concentrate on the same fad ideas or spread out to cover more of the space; our optimism is driven in part by thinking this will lead to more people tackling neglected approaches, which might actually put us on track to solve alignment (something that alignment researchers currently do not seem to think will happen in the status quo).
We need to participate in and build the structures we want to see in the world
We think that as AI development and mainstream concern increase, there’s going to be a significant increase in safety-washing and incentives pushing the ecosystem from challenging necessary work towards pretending to solve problems. We think the way to win that conflict is by showing up, rather than lamenting other people’s incentives. This problem isn’t limited to business relationships; safety-washing is a known problem with nonprofits, government regulations, popular opinion, and so on. Every decision-maker is beholden to their stakeholders, and so decision quality is driven by stakeholder quality.
In order to raise money from investors to keep their startups alive, entrepreneurs will focus on what investors pay attention to. We do not want projects that are attempting to have major long-term impacts to be focused on quarterly profits rather than focused on technically grounded speculative bets. But this means we need investors whose time horizons and underlying projections line up with the time horizons of successfully completing ambitious projects that would resolve fundamental uncertainties. An investment community with that longer-term focus would lead to better decisions, both in this specific case and in aggregate, and so we’re trying to build it.
We would like to see more thinking along the lines of this post; both discussing this particular idea, and analysis of which structures will lead to the most success in tackling the acute risk period. Simultaneously, we also want to stress the need for finding solutions that don’t just ask people to join ongoing intellectual debates or be mindful about their impact, and instead point towards clear positive actions that could be taken. We would like for entrepreneurs to be heads-down focused on their projects that need that focus to survive and attempt to directly solve alignment, trusting that others will take care of meta-level questions and intellectual debates that don’t directly bear on their specific projects.
Practical next steps to solve alignment
Beyond simply arguing that this is a good idea, we want to put in the calories to catalyze more work that differentially advances AI safety.
We believe that people who want to do an AI safety-driven startup and think they’re up for it should either shoot for the most ambitious startup that they can that dramatically advances alignment (like whole brain emulation, brain-computer interfaces, etc.) and/or start small and experiment. We suggest that you just plunge right into doing a startup if you think you can.
It’s easy to get started by doing same-day-skunkworks-style hackathons—or any similar structure that enables fast execution and iteration. Competent product-focused people tend to be surprised at what they can hack out in just one focused day, especially with AI tools aiding development.
If you’re not ready yet, consider developing your skills by doing consulting work. Consulting lets you learn from other people’s mistakes, be directly accountable to users, improve your people skills in ways you didn’t even realize you needed to, grow professionally and emotionally in ways that will make you do better as a startup founder, and so on.
If you think you’re above a high bar technically, we invite you to apply to do (mostly-not-alignment-yet) consulting work with us. We hope this will be a good route to one day launch an alignment-driven startup from within AE’s skunkworks.
We envision scaling our consulting business to primarily do alignment work in the future, perhaps as that becomes more dominant in demand and all the more necessary. We intend to prove this out and then scale it up, hopefully convincing other orgs to copy us, substantially helping solve the talent gap in AI safety work today, and helping to create better orgs to better make use of the huge amount of money we expect to go into alignment in the future.
We believe it’s crucial to foster both supply and demand in the AI safety sector. Interestingly, we have significant exposure to AI-safety-related startup deal flow. If you’re an accredited investor who prioritizes safety and is interested in learning about these opportunities, we invite you to reach out here.
If you’ve already founded an AI safety-driven startup, here is a short list of some investors interested in alignment (this doc is currently publicly editable and anyone can add more to this list).
We also encourage you to apply for funding in our upcoming competition with $50K seed funding for already-existing safety-focused businesses and/or anyone who has promising business ideas that first and foremost advance alignment, to be evaluated by AI safety experts and concerned business leaders.
The future is likely to get really weird, really fast. New tech and capabilities will accelerate what is possible. We can imagine leveraging new generations of AI to create unicorn companies in a few months, with as little as one employee. Considering that startups are the best suited vehicle for taking advantage of new tech and disrupting industries, differentially the more marginal alignment of each startup, the better for alignment.
We think that alignment may be solvable, and that humanity can win. Progress will become bottlenecked by alignment science, and that AI safety-driven startups may be key to solving this, so let’s make sure they actually exist.
Throughout, it is important to keep in mind that the dual-use nature of developments calls for differential tech development, the potentially contagious nature of failures means local experimentation is riskier than it is in normal engineering contexts, and the nature of cognition means that challenges that look continuous might actually have sharp left turns that cause previously functional alignment techniques to no longer work.
Part of this is hope management / burnout avoidance. If burnout is the growing realization that your efforts aren’t visibly helping, and are being done against growing resistance caused by that realization, then the way to avoid burnout is to switch to areas that are more visibly helping; under this strategy, by reducing the scope of your initial focus and ambitions. We think attempts to iteratively develop partial solutions might successfully aggregate into a whole solution and people should explicitly switch focus to this instead of giving up or burning out. Additionally, since the secret to making your startup successful is just knowing you will be terribly demoralized at many times but knowing that no matter what you’ll just “not giving up” there will be some additional force buffeting you through the vicissitudes of alignment mental health issues. The EA community is quick to notice map/territory confusions in this class of startup advice, and we think it’s easy to take the wrong lessons here. We think you should be deliberately optimistic about your persistence and your ability to get up tomorrow and have different ideas as a reason to think that your company and career will succeed, without being attached to the thing that you are trying today.
Separately, having a vibrant startup ecosystem will attract mercenaries. When you can legibly check on whether they’re delivering on their specific promises, mercenaries help, and part of the overall transition we’re suggesting is moving from a generic “are they on the same side?” movement mentality to a “are they doing a better job at their role than their competitors would?” market mentality.
Most of VCs’ returns come from a small percent of their investments. These unicorns that succeed maximally tend not to be companies that do evil things but rather make things users want—and grow organically because they genuinely provide value to users. Seeking to build healthy companies like that requires long term thinking that is also needed to build responsible AI innovations to mitigate AI risk.
Doing this adds a layer of indirection between you and your true goal; rather than being grounded in reality, you’re grounded in your society’s perception of reality. This doesn’t make progress impossible, just more difficult, and often this tradeoff is worthwhile.
To be clear, we are hoping that organizations working directly on alignment will be well-resourced in this scenario, and think there can be situations where it’s worthwhile for philanthropists or governments to subsidize public goods.
At present, it is already difficult to determine which category investments fall into (particularly from the government), and the line between them may become more blurred as time goes on. We predict this will be true even with a somewhat strict definition of ‘alignment R&D’, and considering both capabilities and alignment categories separately.
Windfall trusts would incentivize anyone working on AI safety startups to do individually-likely-to-fail-but-high-impact-if-they-work startups with a tiny amount of equity in other participating companies, so that if any of the startups succeed any startup founder may reap more money than they could ever possibly want with just a tiny bit of equity in that company in post-human economy that rapidly grows much larger ours.
Many people are interested in supporting AI safety, but have technical skills that are useful at a broad range of companies instead of the research skills suitable for independent research in this field. The more safety-focused companies there are, the more those people will be able to work at them instead of at capabilities companies. This also goes for mercenaries who simply find specific safety-focused companies as a better fit, or want to appeal to safety-focused investors, or who view safety as the core limiter to profitable businesses.
One of us (Vaniver) and his husband hold OpenAI units as well as investments in other AI-related startups. He would be happy to set those investments on fire if it meant still being alive in a few decades, and having more investors with priorities like that will hopefully allow companies to make decisions that are better than humanity, while still trying to obtain the potential upsides of responsible development.
While it would be great to come across plans that solve alignment directly, we think we’ve spent a decade looking for them without success, and it consequently makes sense to focus on more iterative plans or partial progress. “Buy dignity”, as Eliezer might say.
The possibility of fast takeoff makes this detection a trickier business than it would be otherwise, but we don’t think purely hypothetical discussion of the risk has succeeded at convincing as many people as it needs to.
Many problems with alignment ideas are patch resistant. It is still the normal course of intellectual development that ideas take time to cook, and that broken ideas are discovered before their functional neighbors. We think the right solution here is to notice the difficulty and put in the work.
We think this approach is important to take on the meta level as well. Maybe people have ideas that seem incorrect to us about how to proceed with the acute risk period, from full-speed-ahead accelerationism to blanket luddism. We think progress looks like bridging between groups, identifying shards of truth wherever they can be found, and iterating towards wise and effective strategies and positions.