I think Conjecture and Anthropic are examples of (mostly?) for-profit companies that are some % concerned with safety and x-risk. These organizations are probably net-positive for AI x-risk compared to the counterfactual where all their researchers are working on AI capabilites at some less safety-focused org instead (e.g. Meta). I’m less sure they’re net-positive if the counterfactual is that all of their researchers are working at hedge funds or doing physics PhDs or whatever. But I haven’t thought about this question on the object-level very closely; I’m more just pointing out that differential impact on capabilities vs. alignment is an important term to consider in a cost-benefit calculation.
Depending on your own model of remaining timelines and takeoff speeds, “Half on capabilities, half on alignment” might end up being neutral to net-negative. It also depends on the quality and kind of your alignment output vs. capabilities output. OTOH, I think there’s also an argument for a high-variance strategy at this stage of the game, so if you have some ideas for alignment that are unique or high-impact, even if they have a low probability of success, that might make your impact very net-positive in expectation. In general though, I think it’s very easy to deceive yourself on this kind of question.
I do think you make valid and reasonable points, and I appreciate and commemorate you for that.
Half on capabilities, half on alignment” might end up being neutral to net-negative.
Let’s use 80000 hours conservative estimate that only around 5B usd is spent on capabilities each year, and 50M on AI alignment. That seems worse than 6B USD spent on capabilities, and 1.05B spent on AI alignment.
A half half approach in this case would 20X the alignment research, but only increase capabilities 20%.
high-variance strategy at this stage of the game
This I agree with, I have some ideas but will consult with experts in the field before pursuing any of them.
Yeah, my own views are that a lot of “alignment” work is mostly capabilities work in disguise. I don’t claim my view is the norm or consensus, but I don’t think it’s totally unique or extreme either (e.g. see Should we publish mechanistic interpretability research?, If interpretability research goes well, it may get dangerous.)
I think Conjecture and Anthropic are examples of (mostly?) for-profit companies that are some % concerned with safety and x-risk. These organizations are probably net-positive for AI x-risk compared to the counterfactual where all their researchers are working on AI capabilites at some less safety-focused org instead (e.g. Meta). I’m less sure they’re net-positive if the counterfactual is that all of their researchers are working at hedge funds or doing physics PhDs or whatever. But I haven’t thought about this question on the object-level very closely; I’m more just pointing out that differential impact on capabilities vs. alignment is an important term to consider in a cost-benefit calculation.
Depending on your own model of remaining timelines and takeoff speeds, “Half on capabilities, half on alignment” might end up being neutral to net-negative. It also depends on the quality and kind of your alignment output vs. capabilities output. OTOH, I think there’s also an argument for a high-variance strategy at this stage of the game, so if you have some ideas for alignment that are unique or high-impact, even if they have a low probability of success, that might make your impact very net-positive in expectation. In general though, I think it’s very easy to deceive yourself on this kind of question.
I do think you make valid and reasonable points, and I appreciate and commemorate you for that.
Let’s use 80000 hours conservative estimate that only around 5B usd is spent on capabilities each year, and 50M on AI alignment. That seems worse than 6B USD spent on capabilities, and 1.05B spent on AI alignment.
A half half approach in this case would 20X the alignment research, but only increase capabilities 20%.
This I agree with, I have some ideas but will consult with experts in the field before pursuing any of them.