A judgement I’m attached to is that a person is either extremely confused or callous if they work in capabilities at a big lab. Is there some nuance I’m missing here?
I would consider, for the sake of humility, that they might disagree with your assessment for actual reasons, rather than assuming confusion is necessary. (I don’t have access to their actual reasoning, apologies.)
Edit: To give you a toy model of reasoning to chew on - Say a researcher has a p(doom from AGI) of 20% from random-origin AGI; 30% from military origin AGI; 10% from commercial lab origin AGI (and perhaps other numbers elsewhere that are similarly suggestive).
They estimate the chances we develop AGI (relatively) soon as roughly 80%, regardless of their intervention.
They also happen to have a have a p(doom from not AGI) of 40% from combined other causes, and expect an aligned AGI to be able to effectively reduce this to something closer to 1% through better coordinating reasonable efforts.
What’s their highest leverage action with that world model?
Hi Ann! Thank you for your comment. Some quick thoughts:
“I would consider, for the sake of humility, that they might disagree with your assessment for actual reasons, rather than assuming confusion is necessary.”
Yep! I have considered this. The purpose of my post is to consider it (I am looking for feedback, not upvotes or downvotes).
“They also happen to have a have a p(doom from not AGI) of 40% from combined other causes, and expect an aligned AGI to be able to effectively reduce this to something closer to 1% through better coordinating reasonable efforts.”
This falls into the confused category for me. I’m not sure how you have a 40% p(doom) from something other than unaligned AGI. Could you spell out for me what could make such a large number?
They predict that the catastrophic tipping points from climate change and perhaps other human-caused environmental changes will cause knock-on effects that eventually add up to our extinction, and the policy struggles to change that currently seem like we will not be able to pull them off despite observing clear initial consequences in terms of fire, storm, and ocean heating.
They model a full nuclear exchange in the context of a worldwide war as being highly possible and only narrowly evaded so far, and consider the consequences of that to cause or at least be as bad as extinction.
They are reasonably confident that pandemics arising or engineered without the help of AI could, in fact, take out our species under favorable circumstances, and worry the battlefield of public health is currently slipping towards the favor of diseases over time.
Probably smaller contributors going forward: They are familiar with other religious groups inclined to bring about the apocalypse and have some actual concern over their chance of success. (Probably U.S.-focused.)
They are looking at longer time frames, and are thinking of various catastrophes likely within the decades or centuries immediately after we would otherwise have developed AGI, some of them possibly caused by the policies necessary to not do so.
They think humans may voluntarily decide it is not worth existing as a species unless we make it worth their while properly, and should not be stopped from making this choice. Existence, and the world as it is for humans, is hell in some pretty important and meaningful ways.
They are not long-termists in any sense but stewardship, and are counting the possibility that everyone who exists and matters to them under a short-term framework ages and dies.
They consider most humans to currently be in a state of suffering worse than non-existence, the s-risk of doom is currently 100%, and the 60% not-doom is mostly optimism we can make that state better.
And overall, generally, a belief that not-doom is fragile; that species do not always endure; that there is no guarantee, and our genus happens to be into the dice-rolling part of its lifespan even if we weren’t doing various unusual things that might increase our risk as much as decrease. (Probably worth noting that several species of humans, our equals based on archaeological finds and our partners based on genomic, have gone extinct.)
Someone I know who works at Anthropic, not on alignment, has thought pretty hard about this and concluded it was better than alternatives. Some factors include
by working on capabilities, you free up others for alignment work who were previously doing capabilities but would prefer alignment
more competition on product decreases aggregate profits of scaling labs
At one point some kind of post was planned but I’m not sure if this is still happening.
I also think there are significant upskilling benefits to working on capabilities, though I believe this less than I did the other day.
Thanks for your comment Thomas! I appreciate the effort. I have some questions:
by working on capabilities, you free up others for alignment work who were previously doing capabilities but would prefer alignment
I am a little confused by this, would you mind spelling it out for me? Imagine “Steve” took a job at “FakeLab” in capabilities. Are you saying Steve making this decision creates a Safety job for “Jane” at “FakeLab”, that otherwise wouldn’t have existed?
more competition on product decreases aggregate profits of scaling labs
Again I am a bit confused. You’re suggesting that if, for e.g., General Motors announced tomorrow they were investing $20 billion to start an AGI lab, that would be a good thing?
Jane at FakeLab has a background in interpretability but is currently wrangling data / writing internal tooling / doing some product thing because the company needs her to, because otherwise FakeLab would have no product and be unable to continue operating including its safety research. Steve has comparative advantage at Jane’s current job.
It seems net bad because the good effect of slowing down OpenAI is smaller than the bad effect of GM racing? But OpenAI is probably slowed down—they were already trying to build AGI and they have less money and possibly less talent. Thinking about the net effect is complicated and I don’t have time to do it here. The situation with joining a lab rather than founding one may also be different.
A judgement I’m attached to is that a person is either extremely confused or callous if they work in capabilities at a big lab. Is there some nuance I’m missing here?
I would consider, for the sake of humility, that they might disagree with your assessment for actual reasons, rather than assuming confusion is necessary. (I don’t have access to their actual reasoning, apologies.)
Edit: To give you a toy model of reasoning to chew on -
Say a researcher has a p(doom from AGI) of 20% from random-origin AGI;
30% from military origin AGI;
10% from commercial lab origin AGI
(and perhaps other numbers elsewhere that are similarly suggestive).
They estimate the chances we develop AGI (relatively) soon as roughly 80%, regardless of their intervention.
They also happen to have a have a p(doom from not AGI) of 40% from combined other causes, and expect an aligned AGI to be able to effectively reduce this to something closer to 1% through better coordinating reasonable efforts.
What’s their highest leverage action with that world model?
Hi Ann! Thank you for your comment. Some quick thoughts:
“I would consider, for the sake of humility, that they might disagree with your assessment for actual reasons, rather than assuming confusion is necessary.”
Yep! I have considered this. The purpose of my post is to consider it (I am looking for feedback, not upvotes or downvotes).
“They also happen to have a have a p(doom from not AGI) of 40% from combined other causes, and expect an aligned AGI to be able to effectively reduce this to something closer to 1% through better coordinating reasonable efforts.”
This falls into the confused category for me. I’m not sure how you have a 40% p(doom) from something other than unaligned AGI. Could you spell out for me what could make such a large number?
Here’s a few possibilities:
They predict that the catastrophic tipping points from climate change and perhaps other human-caused environmental changes will cause knock-on effects that eventually add up to our extinction, and the policy struggles to change that currently seem like we will not be able to pull them off despite observing clear initial consequences in terms of fire, storm, and ocean heating.
They model a full nuclear exchange in the context of a worldwide war as being highly possible and only narrowly evaded so far, and consider the consequences of that to cause or at least be as bad as extinction.
They are reasonably confident that pandemics arising or engineered without the help of AI could, in fact, take out our species under favorable circumstances, and worry the battlefield of public health is currently slipping towards the favor of diseases over time.
Probably smaller contributors going forward: They are familiar with other religious groups inclined to bring about the apocalypse and have some actual concern over their chance of success. (Probably U.S.-focused.)
They are looking at longer time frames, and are thinking of various catastrophes likely within the decades or centuries immediately after we would otherwise have developed AGI, some of them possibly caused by the policies necessary to not do so.
They think humans may voluntarily decide it is not worth existing as a species unless we make it worth their while properly, and should not be stopped from making this choice. Existence, and the world as it is for humans, is hell in some pretty important and meaningful ways.
They are not long-termists in any sense but stewardship, and are counting the possibility that everyone who exists and matters to them under a short-term framework ages and dies.
They consider most humans to currently be in a state of suffering worse than non-existence, the s-risk of doom is currently 100%, and the 60% not-doom is mostly optimism we can make that state better.
And overall, generally, a belief that not-doom is fragile; that species do not always endure; that there is no guarantee, and our genus happens to be into the dice-rolling part of its lifespan even if we weren’t doing various unusual things that might increase our risk as much as decrease. (Probably worth noting that several species of humans, our equals based on archaeological finds and our partners based on genomic, have gone extinct.)
Maybe their goal is sabotage. Maybe they enjoy deception or consider themselves to be comparatively advantaged at it.
Hi Richard! Thanks for the comment. It seems to me that might apply to < 5% of people in capabilities?
My probability is more like 0.1%.
Someone I know who works at Anthropic, not on alignment, has thought pretty hard about this and concluded it was better than alternatives. Some factors include
by working on capabilities, you free up others for alignment work who were previously doing capabilities but would prefer alignment
more competition on product decreases aggregate profits of scaling labs
At one point some kind of post was planned but I’m not sure if this is still happening.
I also think there are significant upskilling benefits to working on capabilities, though I believe this less than I did the other day.
Thanks for your comment Thomas! I appreciate the effort. I have some questions:
by working on capabilities, you free up others for alignment work who were previously doing capabilities but would prefer alignment
I am a little confused by this, would you mind spelling it out for me? Imagine “Steve” took a job at “FakeLab” in capabilities. Are you saying Steve making this decision creates a Safety job for “Jane” at “FakeLab”, that otherwise wouldn’t have existed?
more competition on product decreases aggregate profits of scaling labs
Again I am a bit confused. You’re suggesting that if, for e.g., General Motors announced tomorrow they were investing $20 billion to start an AGI lab, that would be a good thing?
Jane at FakeLab has a background in interpretability but is currently wrangling data / writing internal tooling / doing some product thing because the company needs her to, because otherwise FakeLab would have no product and be unable to continue operating including its safety research. Steve has comparative advantage at Jane’s current job.
It seems net bad because the good effect of slowing down OpenAI is smaller than the bad effect of GM racing? But OpenAI is probably slowed down—they were already trying to build AGI and they have less money and possibly less talent. Thinking about the net effect is complicated and I don’t have time to do it here. The situation with joining a lab rather than founding one may also be different.