I expect that for most people, starting a new for-profit (or non-profit) AI alignment organization is likely to be net-negative for AI x-risk
While there are some examples of this, such as OpenAI, I still find this claim to be rather bold. If no one was starting AI alignment orgs we would still have roughly the same capabilities today, but only a fraction of the alignment research. Right now, over a hundred times more money is spent on advancing AI compared to reducing risks, so even a company spending half their resources advancing capabilites, and half on AI alignment, that seems net positive to me.
Your concern is justified however, so I will not proceed with any business plan without consulting several experts in the field first.
I would make sure to consult with experts regarding donation strategy/orgs to donate to, but would probably mostly be some of the orgs you mentioned, and perhaps some to the Long Term Future Fund.
I think Conjecture and Anthropic are examples of (mostly?) for-profit companies that are some % concerned with safety and x-risk. These organizations are probably net-positive for AI x-risk compared to the counterfactual where all their researchers are working on AI capabilites at some less safety-focused org instead (e.g. Meta). I’m less sure they’re net-positive if the counterfactual is that all of their researchers are working at hedge funds or doing physics PhDs or whatever. But I haven’t thought about this question on the object-level very closely; I’m more just pointing out that differential impact on capabilities vs. alignment is an important term to consider in a cost-benefit calculation.
Depending on your own model of remaining timelines and takeoff speeds, “Half on capabilities, half on alignment” might end up being neutral to net-negative. It also depends on the quality and kind of your alignment output vs. capabilities output. OTOH, I think there’s also an argument for a high-variance strategy at this stage of the game, so if you have some ideas for alignment that are unique or high-impact, even if they have a low probability of success, that might make your impact very net-positive in expectation. In general though, I think it’s very easy to deceive yourself on this kind of question.
I do think you make valid and reasonable points, and I appreciate and commemorate you for that.
Half on capabilities, half on alignment” might end up being neutral to net-negative.
Let’s use 80000 hours conservative estimate that only around 5B usd is spent on capabilities each year, and 50M on AI alignment. That seems worse than 6B USD spent on capabilities, and 1.05B spent on AI alignment.
A half half approach in this case would 20X the alignment research, but only increase capabilities 20%.
high-variance strategy at this stage of the game
This I agree with, I have some ideas but will consult with experts in the field before pursuing any of them.
While there are some examples of this, such as OpenAI, I still find this claim to be rather bold. If no one was starting AI alignment orgs we would still have roughly the same capabilities today, but only a fraction of the alignment research. Right now, over a hundred times more money is spent on advancing AI compared to reducing risks, so even a company spending half their resources advancing capabilites, and half on AI alignment, that seems net positive to me.
Your concern is justified however, so I will not proceed with any business plan without consulting several experts in the field first.
I would make sure to consult with experts regarding donation strategy/orgs to donate to, but would probably mostly be some of the orgs you mentioned, and perhaps some to the Long Term Future Fund.
Yeah, my own views are that a lot of “alignment” work is mostly capabilities work in disguise. I don’t claim my view is the norm or consensus, but I don’t think it’s totally unique or extreme either (e.g. see Should we publish mechanistic interpretability research?, If interpretability research goes well, it may get dangerous.)
I think Conjecture and Anthropic are examples of (mostly?) for-profit companies that are some % concerned with safety and x-risk. These organizations are probably net-positive for AI x-risk compared to the counterfactual where all their researchers are working on AI capabilites at some less safety-focused org instead (e.g. Meta). I’m less sure they’re net-positive if the counterfactual is that all of their researchers are working at hedge funds or doing physics PhDs or whatever. But I haven’t thought about this question on the object-level very closely; I’m more just pointing out that differential impact on capabilities vs. alignment is an important term to consider in a cost-benefit calculation.
Depending on your own model of remaining timelines and takeoff speeds, “Half on capabilities, half on alignment” might end up being neutral to net-negative. It also depends on the quality and kind of your alignment output vs. capabilities output. OTOH, I think there’s also an argument for a high-variance strategy at this stage of the game, so if you have some ideas for alignment that are unique or high-impact, even if they have a low probability of success, that might make your impact very net-positive in expectation. In general though, I think it’s very easy to deceive yourself on this kind of question.
I do think you make valid and reasonable points, and I appreciate and commemorate you for that.
Let’s use 80000 hours conservative estimate that only around 5B usd is spent on capabilities each year, and 50M on AI alignment. That seems worse than 6B USD spent on capabilities, and 1.05B spent on AI alignment.
A half half approach in this case would 20X the alignment research, but only increase capabilities 20%.
This I agree with, I have some ideas but will consult with experts in the field before pursuing any of them.