I’m pretty sure that most EAs I know have ~100% confidence that what they’re doing is net positive for the long-term future).
Really? Without giving away names, can you tell me roughly what cluster they are in? Geographical area, age range, roughly what vocation (technical AI safety/AI policy/biosecurity/community building/earning-to-give)?
I’m super interested in how you might have arrived at this belief: would you be able to elaborate a little? For instance, is there a theoretical argument going on here, like a weak form of cluelessness? Or is it more empirical,
Definitely closer to the former than the latter! Here are some steps in my thought process:
The standard longtermist cluelessness arguments (“you can’t be sure if eg improving labor laws in India is good because it has uncertain effects on the population and happiness of people in Alpha Centauri in the year 4000”) doesn’t apply in full-force if you buy high near-term (10-100 years) probability of AI doom, and that AI doom is astonomically bad and avoidable.
or (less commonly on LW but more common in some other EA circles) other sources of hinge of history like totalitarian lock-in, s-risks, biological tech doom, etc
If you assign low credence in any hinge of history hypothesis, I think you are still screwed by the standard cluelessness arguments, unfortunately.
But even with a belief in x-risk hinge of history, cluelessness still apply significantly. Knowing whether an action reduces x-risk is much easier in relative terms than knowing whether an action will improve the far future in the absence of x-risk, but it’s still hard in absolute terms.
If we drill down on a specific action and a specific theory of change (“I want to convince a specific Senator to sign a specific bill to regulate the size of LLM models trained in 2024”, “I want to do this type of technical research to understand this particular bug in this class of transformer models, because better understanding of this bug can differentially advance alignment over capabilities at Anthropic if Anthropic will scale up this type of model”), any particular action’s impact is just built on a tower of conjunctions and it’s really hard to get any grounding to seriously argue that it’s probably positive.
So how do you get any robustness? You imagine the set of all your actions as slightly positive bets/positively biased coin flips (eg a grantmaker might investigate 100+ grants in a year, something like deconfusion research might yield a number of different positive results, field-building for safety might cause a number of different positive outcomes, you can earn-to-give for multiple longtermist orgs, etc). If heads are “+1” and tails are “-1″, and you have a lot of flips, then the central limit theorem gets you a nice normal distribution with a positive mean and thin tails.
Unfortunately the real world is a lot less nice than this because:
A concrete example is that maybe a really unexpectedly bad grant can wipe out all of the positive impact your good grants have gotten, and then some.
the impact and theories of change of all your actions likely share a worldview and have internal correlations
eg, “longtermist EA fieldbuilding” have multiple theories of impact, but you can be wrong about a few important things and e.g. (almost) all of them might end up differentially advancing capabilities over alignment, in very correlated ways.
You might not have all that many flips that matter
The real world is finite, your life is finite, etc, so even if in the limit your approach is net positive, there’s no guarantee that in practice your actions are net positive before either you die or the singularity happens.
That doesn’t mean it’s wrong to dedicate your life to a single really important bet! (as long as you are obeying reasonable deontological and virtue ethics constraints, you’re trying your best to be reasonable, etc).
For people in those shoes, a possibly helpful mental motion is to try to think less of individual impact and more communally. Maybe it’s like voting: individual votes are ~useless but collectively people-who-think-like-you can hopefully vote for a good leader. If enough people-like-you follow an algorithm of “do unlikely-to-work research projects that are slightly positive in expectation”, collectively we can do something important.
probably a few other things I’m missing.
So the central modeling issues become a) how many flips you get, b) how likely all the flips are dominated by a single coin, c) how much internal correlation there is between each coin flip.
And my gut is like, it seems like you get a fair number of flips, it’s reasonably likely but not certain that one (or a few) flips dominate, and the internal correlation is high but not 1(and not very close to 1).
There’s a few more thoughts I have but that’s the general gist. Unfortunately it’s not very mathematical/quantitive or much of a model; my guess is that both more conceptual thinking and more precise models can yield some more clarity, but ultimately we (or at least I) will still end up fairly confused even after that.
I’m also interested in thoughts from other people here; I’m sure I’m not the only person who is worried about this type of thing.
(Also please don’t buy my exact probabilities. They are very much not resilient. Like I’m pretty sure if I thought about it for 10 years (without new empirical information) the probability can’t be much higher than 90%, and I’m pretty sure the probabilities are high enough to be non-Pascalian, so not as low as say 50% + 1-in-a-quadrallion, but anywhere in between seems kinda defensible).
Really? Without giving away names, can you tell me roughly what cluster they are in? Geographical area, age range, roughly what vocation (technical AI safety/AI policy/biosecurity/community building/earning-to-give)?
Definitely closer to the former than the latter! Here are some steps in my thought process:
The standard longtermist cluelessness arguments (“you can’t be sure if eg improving labor laws in India is good because it has uncertain effects on the population and happiness of people in Alpha Centauri in the year 4000”) doesn’t apply in full-force if you buy high near-term (10-100 years) probability of AI doom, and that AI doom is astonomically bad and avoidable.
or (less commonly on LW but more common in some other EA circles) other sources of hinge of history like totalitarian lock-in, s-risks, biological tech doom, etc
If you assign low credence in any hinge of history hypothesis, I think you are still screwed by the standard cluelessness arguments, unfortunately.
But even with a belief in x-risk hinge of history, cluelessness still apply significantly. Knowing whether an action reduces x-risk is much easier in relative terms than knowing whether an action will improve the far future in the absence of x-risk, but it’s still hard in absolute terms.
If we drill down on a specific action and a specific theory of change (“I want to convince a specific Senator to sign a specific bill to regulate the size of LLM models trained in 2024”, “I want to do this type of technical research to understand this particular bug in this class of transformer models, because better understanding of this bug can differentially advance alignment over capabilities at Anthropic if Anthropic will scale up this type of model”), any particular action’s impact is just built on a tower of conjunctions and it’s really hard to get any grounding to seriously argue that it’s probably positive.
So how do you get any robustness? You imagine the set of all your actions as slightly positive bets/positively biased coin flips (eg a grantmaker might investigate 100+ grants in a year, something like deconfusion research might yield a number of different positive results, field-building for safety might cause a number of different positive outcomes, you can earn-to-give for multiple longtermist orgs, etc). If heads are “+1” and tails are “-1″, and you have a lot of flips, then the central limit theorem gets you a nice normal distribution with a positive mean and thin tails.
Unfortunately the real world is a lot less nice than this because:
the impact of your different actions are heavy-tailed, likely in both directions.
A concrete example is that maybe a really unexpectedly bad grant can wipe out all of the positive impact your good grants have gotten, and then some.
the impact and theories of change of all your actions likely share a worldview and have internal correlations
eg, “longtermist EA fieldbuilding” have multiple theories of impact, but you can be wrong about a few important things and e.g. (almost) all of them might end up differentially advancing capabilities over alignment, in very correlated ways.
You might not have all that many flips that matter
The real world is finite, your life is finite, etc, so even if in the limit your approach is net positive, there’s no guarantee that in practice your actions are net positive before either you die or the singularity happens.
That doesn’t mean it’s wrong to dedicate your life to a single really important bet! (as long as you are obeying reasonable deontological and virtue ethics constraints, you’re trying your best to be reasonable, etc).
For people in those shoes, a possibly helpful mental motion is to try to think less of individual impact and more communally. Maybe it’s like voting: individual votes are ~useless but collectively people-who-think-like-you can hopefully vote for a good leader. If enough people-like-you follow an algorithm of “do unlikely-to-work research projects that are slightly positive in expectation”, collectively we can do something important.
probably a few other things I’m missing.
So the central modeling issues become a) how many flips you get, b) how likely all the flips are dominated by a single coin, c) how much internal correlation there is between each coin flip.
And my gut is like, it seems like you get a fair number of flips, it’s reasonably likely but not certain that one (or a few) flips dominate, and the internal correlation is high but not 1(and not very close to 1).
There’s a few more thoughts I have but that’s the general gist. Unfortunately it’s not very mathematical/quantitive or much of a model; my guess is that both more conceptual thinking and more precise models can yield some more clarity, but ultimately we (or at least I) will still end up fairly confused even after that.
I’m also interested in thoughts from other people here; I’m sure I’m not the only person who is worried about this type of thing.
(Also please don’t buy my exact probabilities. They are very much not resilient. Like I’m pretty sure if I thought about it for 10 years (without new empirical information) the probability can’t be much higher than 90%, and I’m pretty sure the probabilities are high enough to be non-Pascalian, so not as low as say 50% + 1-in-a-quadrallion, but anywhere in between seems kinda defensible).