Assuming a positive singularity, how should humanity divide its resources? I think the obvious (and essentially correct) answer is “in that situation, you have an aligned superintelligence, so just ask it what to do.” But I nevertheless want to philosophize a bit about this, for one main reason.
That reason is: an important factor imo in determining the right thing to do in distributing resources post-singularity is what incentives that choice of resource allocation creates for people pre-singularity. For those incentives to work, though, we have to actually be thinking about this now, since that’s what allows the choice of resource distribution post-singularity to have its acausal influence on our choices pre-singularity. I will note that this is definitely something that I think about sometimes, and something that I think a lot of other people also implicitly think about sometimes when they consider things like amassing wealth, specifically gaining control over current AIs, and/or the future value of their impact certificates.
So, what are some of the possible options for how to distribute resources post-singularity? Let’s go over some of the various possible solutions here and why I don’t think any of the obvious things here are what you want:
The evolutionary/capitalist solution: divide future resources in proportion to control of current resources (e.g. AIs). This is essentially what happens by default if you keep in place an essentially capitalist system and have all the profits generated by your AIs flow to the owners of those AIs. Another version of this is a more power/bargaining-oriented version where you divide resources amongst agents in proportion to the power those agents could bring to bear if they chose to fight for those resources.
The most basic problem with this solution is that it’s a moral catastrophe if the people that get all the resources don’t do good things with them. We should not want to build AIs that lead to this outcome—and I wouldn’t really call AIs that created this outcome aligned.
Another more subtle problem with this solution is that it creates terrible incentives for current people if they expect this to be what happens, since it e.g. incentivizes people to maximize their personal control over AIs at the expense of spending more resources trying to align those AIs.
I feel like I see this sort of thinking a lot and I think that if we were to make it more clear that this is never what should happen in a positive singularity that then people would do this sort of thing less.
The egalitarian/democratic solution: divide resources equally amongst all current humans. This is what naive preference utilitarianism would do.
Though it might be less obvious than with the first solution, I think this solution also leads to a moral catastrophe, since it cements current people as oligarchs over future people, leads to value lock-in, and could create a sort of tyranny of the present.
This solution also creates some weird incentives for trying to spread your ideals as widely as possible and to create as many people as possible that share your ideals.
The unilateralist/sovereign/past-agnostic/CEV solution: concentrate all resources under the control of your aligned AI(s), then distribute those resources in accordance with how they generate the most utility/value/happiness/goodness/etc., without any special prejudice given to existing people.
In some sense, this is the “right” thing to do, and it’s pretty close to what I would ideally want. However, it has a couple of issues:
Though, unlike the first solution, it doesn’t create any perverse incentives right now, it doesn’t create any positive incentives either.
Since this solution doesn’t give any special prejudice to current people, it might be difficult to get current people to agree to this solution, if that’s necessary.
The retroactive impact certificate solution: divide future resources in proportion to retroactively-assessed social value created by past agents.
This solution obviously creates the best incentives for current agents, so in that sense it does very well.
However, it still does pretty poorly on potentially creating a moral catastrophe, since the people that created the most social value in the past need not continue doing so in the future.
As above, I don’t think that you should want your aligned AI to implement any of these particular solutions. I think some combination of (3) and (4) is probably the best out of these options, though of course I’m sure that if you actually asked an aligned superintelligent AI it would do better than any of these. More broadly, though, I think that it’s important to note that (1), (2), and (4) are all failure stories, not success stories, and you shouldn’t expect them to happen in any scenario where we get alignment right.
Circling back to the original reason that I wanted to discuss all of this, which is how it should influence our decisions now:
Obviously, the part of your values that isn’t selfish should continue to want things to go well.
However, for the part of your values that cares about your own future resources, if that’s something that you care about, how you go about maximizing that is going to depend on what you most expect between (1), (2), and (4).
First, in determining this, you should condition on situations where you don’t just die or are otherwise totally disempowered, since obviously those are the only cases where this matters. And if that probability is quite high, then presumably a lot of your selfish values should just want to minimize that probability.
However, going ahead anyway and conditioning on everyone not being dead/disempowered, what should you expect? I think that (1) and (2) are possible in worlds where get some parts of alignment right, but overall are pretty unlikely: it’s a very narrow band of not-quite-alignment that gets you there. So probably if I cared about this a lot I’d focus more on (4) than (1) and (2).
Which of course gets me to why I’m writing this up, since that seems like a good message for people to pick up. Though I expect it to be quite difficult to effectively communicate this very broadly.
Disagree. I’m in favor of (2) because I think that what you call a “tyranny of the present” makes perfect sense. Why would the people of the present not maximize their utility functions, given that it’s the rational thing for them to do by definition of “utility function”? “Because utilitarianism” is a nonsensical answer IMO. I’m not a utilitarian. If you’re a utilitarian, you should pay for your utilitarianism out of your own resource share. For you to demand that I pay for your utilitarianism is essentially a defection in the decision-theoretic sense, and would incentivize people like me to defect back.
As to problem (2.b), I don’t think it’s a serious issue in practice because time until singularity is too short for it to matter much. If it was, we could still agree on a cooperative strategy that avoids a wasteful race between present people.
Even if you don’t personally value other people, if you’re willing to step behind the veil of ignorance with respect to whether you’ll be an early person or a late person, it’s clearly advantageous before you know which one you’ll be to not allocate all the resources to the early people.
First, I said I’m not a utilitarian, I didn’t say that I don’t value other people. There’s a big difference!
Second, I’m not willing to step behind that veil of ignorance. Why should I? Decision-theoretically, it can make sense to argue “you should help agent X because in some counterfactual, agent X would be deciding whether to help you using similar reasoning”. But, there might be important systematic differences between early people and late people (for example, because late people are modified in some ways compared to the human baseline) which break the symmetry. It might be a priori improbable for me to be born as a late person (and still be me in the relevant sense) or for a late person to be born in our generation[1].
Moreover, if there is a valid decision-theoretic argument to assign more weight to future people, then surely a superintelligent AI acting on my behalf would understand this argument and act on it. So, this doesn’t compel me to precommit to a symmetric agreement with future people in advance.
There is a stronger case for intentionally creating and giving resources to people who are early in counterfactual worlds. At least, assuming people have meaningful preferences about the state of never-being-born.
If a future decision is to shape the present, we need to predict it.
The decision-theoretic strategy “Figure out where you are, then act accordingly.” is merely an approximation to “Use the policy that leads to the multiverse you prefer.”. You *can* bring your present loyalties with you behind the veil, it might just start to feel farcically Goodhartish at some point.
There are of course no probabilities of being born into one position or another, there are only various avatars through which your decisions affect the multiverse. The closest thing to probabilities you’ll find is how much leverage each avatar offers: The least wrong probabilistic anthropics translates “the effect of your decisions through avatar A is twice as important as through avatar B” into “you are twice as likely to be A as B”.
So if we need probabilities of being born early vs. late, we can compare their leverage. We find:
Quantum physics shows that the timeline splits a bazillion times a second. So each second, you become a bazillion yous, but the portions of the multiverse you could first-order impact are divided among them. Therefore, you aren’t significantly more or less likely to find yourself a second earlier or later.
Astronomy shows that there’s a mazillion stars up there. So we build a Dyson sphere and huge artificial womb clusters, and one generation later we launch one colony ship at each star. But in that generation, the fate of the universe becomes a lot more certain, so we should expect to find ourselves before that point, not after.
Physics shows that several constants are finely tuned to support organized matter. We can infer that elsewhere, they aren’t. Since you’d think that there are other, less precarious arrangements of physical law with complex consequences, we can also moderately update towards that very precariousness granting us unusual leverage about something valuable in the acausal marketplace.
History shows that we got lucky during the Cold War. We can slightly update towards:
Current events are important.
Current events are more likely after a Cold War.
Nuclear winter would settle the universe’s fate.
The news show that ours is the era of inadequate AI alignment theory. We can moderately update towards being in a position to affect that.
i feel like (2)/(3) is about “what does (the altruistic part of) my utility function want?” and 4 is “how do i decision-theoretically maximize said utility function?”. they’re different layers, and ultimately it’s (2)/(3) we want to maximize, but maximizing (2)/(3) entails allocating some of the future lightcore to (4).
I think that (3) does create strong incentives right now—at least for anyone who assumes [without any special prejudice given to existing people] amounts to [and it’s fine to disassemble everyone who currently exists if it’s the u/v/h/g/etc maximising policy]. This seems probable to me, though not entirely clear (I’m not an optimal configuration, and smoothly, consciousness-preservingly transitioning me to something optimal seems likely to take more resources than unceremoniously recycling me).
Incentives now include:
Prevent (3) happening.
To the extent that you expect (3) and are selfish, live for the pre-(3) time interval, for (3) will bring your doom.
On (4), “This solution obviously creates the best incentives for current agents” seems badly mistaken unless I’m misunderstanding you.
Something in this spirit would need to be based on a notion of [expected social value], not on actual contributions, since in the cases where we die we don’t get to award negative points.
For example, suppose my choice is between: A: {90% chance doom for everyone; 10% I save the world} B: {85% chance doom for everyone; 15% someone else saves the world}
To the extent that I’m selfish, and willing to risk some chance of death for greater control over the future, I’m going to pick A under (4). The more selfish, reckless and power-hungry I am, and the more what I want deviates from that most people want, the more likely I am to actively put myself in position to take an A-like action.
Moreover, if the aim is to get ideal incentives, it seems unavoidable to have symmetry and include punishments rather than only [you don’t get many resources]. Otherwise the incentive is to shoot for huge magnitude of impact, without worrying much about the sign, since no-one can do worse than zero resources.
If correct incentives were the only desideratum, I don’t see how we’d avoid [post-singularity ‘hell’ (with some probability) for those who’re reckless with AGI]. For any nicer approach I think we’d either be incenting huge impact with uncertain sign, or failing to incent large sacrifice in order to save the world.
Perhaps the latter is best?? I.e. cap the max resources for any individual at a fairly low level, so that e.g. [this person was in the top percentile of helpfulness] and [this person saved the world] might get you about the same resource allocation. It has the upsides both of making ‘hell’ less necessary, and of giving a lower incentive to overconfident people with high-impact schemes. (but still probably incents particularly selfish people to pick A over B)
If correct incentives were the only desideratum, I don’t see how we’d avoid [post-singularity ‘hell’ (with some probability) for those who’re reckless with AGI].
(some very mild spoilers for yudkowsky’s planecrash glowfic (very mild as in this mostly does not talk about the story, but you could deduce things about where the story goes by the fact that characters in it are discussing this))
[edit: links in spoiler tags are bugged. in the spoiler, “speculates about” should link to here and “have the stance that” to here]
“The Negative stance is that everyone just needs to stop calculating how to pessimize anybody else’s utility function, ever, period. That’s a simple guideline for how realness can end up mostly concentrated inside of events that agents want, instead of mostly events that agents hate.”
“If at any point you’re calculating how to pessimize a utility function, you’re doing it wrong. If at any point you’re thinking about how much somebody might get hurt by something, for a purpose other than avoiding doing that, you’re doing it wrong.)”
i think this is a pretty solid principle. i’m very much not a fan of anyone’s utility function getting pessimized.
so pessimising a utility function is a bad idea. but we can still produce correct incentive gradients in other ways! for example, we could say that every moral patient starts with 1 unit of utility function handshake, but if you destroy the world you lose some of your share. maybe if you take actions that cause ⅔ of timelines to die, you only get ⅓ units of utility function handshake, and the more damage you do the less handshake you get.
it never gets into the negative, that way we never go out of our way to pessimize someone’s utility function; but it does get increasingly close to 0.
(this isn’t necessarily a scheme i’m committed to, it’s just an idea i’ve had for a scheme that provides the correct incentives for not destroying the world, without having to create hells / pessimize utility functions)
Hmmm, I don’t think that kind of thing is going to give correct world-saving incentives for the selfish part of people (unless failing to save the world counts as destroying it—in which case almost everyone is going to get approximately no influence). More fundamentally, I don’t think it works out in this kind of case due to logical uncertainty.
If I’m uncertain about a particular plan, and my estimate is {80% everyone dies; 20% I save the world}, that’s not {in 80% of timelines everyone dies; in 20% of timelines I save the world}.
It’s closer to [there’s an 80% chance that {in ~99% of timelines everyone dies}; there’s a 20% chance that {in ~99% of timelines I save the world}].
So, conditional on my saving the world in some timeline by taking some action, I saved the world in most timelines where I took that action and would get a load of influence. This won’t disincentivize risky gambles for selfish/power-hungry people. (at least of the form [let’s train this model and see what happens] - most of the danger there being a logical uncertainty thing)
I think influence would need to be based on expected social value given the ‘correct’ level of logical uncertainty—probably something like [what (expected value | your action) is justified by your beliefs, and valid arguments you’d make for them based on information you have]. Or at least some subjective perspective seems to be necessary—and something that doesn’t give more points for overconfident people.
Epistemic status: random philosophical musings.
Assuming a positive singularity, how should humanity divide its resources? I think the obvious (and essentially correct) answer is “in that situation, you have an aligned superintelligence, so just ask it what to do.” But I nevertheless want to philosophize a bit about this, for one main reason.
That reason is: an important factor imo in determining the right thing to do in distributing resources post-singularity is what incentives that choice of resource allocation creates for people pre-singularity. For those incentives to work, though, we have to actually be thinking about this now, since that’s what allows the choice of resource distribution post-singularity to have its acausal influence on our choices pre-singularity. I will note that this is definitely something that I think about sometimes, and something that I think a lot of other people also implicitly think about sometimes when they consider things like amassing wealth, specifically gaining control over current AIs, and/or the future value of their impact certificates.
So, what are some of the possible options for how to distribute resources post-singularity? Let’s go over some of the various possible solutions here and why I don’t think any of the obvious things here are what you want:
The evolutionary/capitalist solution: divide future resources in proportion to control of current resources (e.g. AIs). This is essentially what happens by default if you keep in place an essentially capitalist system and have all the profits generated by your AIs flow to the owners of those AIs. Another version of this is a more power/bargaining-oriented version where you divide resources amongst agents in proportion to the power those agents could bring to bear if they chose to fight for those resources.
The most basic problem with this solution is that it’s a moral catastrophe if the people that get all the resources don’t do good things with them. We should not want to build AIs that lead to this outcome—and I wouldn’t really call AIs that created this outcome aligned.
Another more subtle problem with this solution is that it creates terrible incentives for current people if they expect this to be what happens, since it e.g. incentivizes people to maximize their personal control over AIs at the expense of spending more resources trying to align those AIs.
I feel like I see this sort of thinking a lot and I think that if we were to make it more clear that this is never what should happen in a positive singularity that then people would do this sort of thing less.
The egalitarian/democratic solution: divide resources equally amongst all current humans. This is what naive preference utilitarianism would do.
Though it might be less obvious than with the first solution, I think this solution also leads to a moral catastrophe, since it cements current people as oligarchs over future people, leads to value lock-in, and could create a sort of tyranny of the present.
This solution also creates some weird incentives for trying to spread your ideals as widely as possible and to create as many people as possible that share your ideals.
The unilateralist/sovereign/past-agnostic/CEV solution: concentrate all resources under the control of your aligned AI(s), then distribute those resources in accordance with how they generate the most utility/value/happiness/goodness/etc., without any special prejudice given to existing people.
In some sense, this is the “right” thing to do, and it’s pretty close to what I would ideally want. However, it has a couple of issues:
Though, unlike the first solution, it doesn’t create any perverse incentives right now, it doesn’t create any positive incentives either.
Since this solution doesn’t give any special prejudice to current people, it might be difficult to get current people to agree to this solution, if that’s necessary.
The retroactive impact certificate solution: divide future resources in proportion to retroactively-assessed social value created by past agents.
This solution obviously creates the best incentives for current agents, so in that sense it does very well.
However, it still does pretty poorly on potentially creating a moral catastrophe, since the people that created the most social value in the past need not continue doing so in the future.
As above, I don’t think that you should want your aligned AI to implement any of these particular solutions. I think some combination of (3) and (4) is probably the best out of these options, though of course I’m sure that if you actually asked an aligned superintelligent AI it would do better than any of these. More broadly, though, I think that it’s important to note that (1), (2), and (4) are all failure stories, not success stories, and you shouldn’t expect them to happen in any scenario where we get alignment right.
Circling back to the original reason that I wanted to discuss all of this, which is how it should influence our decisions now:
Obviously, the part of your values that isn’t selfish should continue to want things to go well.
However, for the part of your values that cares about your own future resources, if that’s something that you care about, how you go about maximizing that is going to depend on what you most expect between (1), (2), and (4).
First, in determining this, you should condition on situations where you don’t just die or are otherwise totally disempowered, since obviously those are the only cases where this matters. And if that probability is quite high, then presumably a lot of your selfish values should just want to minimize that probability.
However, going ahead anyway and conditioning on everyone not being dead/disempowered, what should you expect? I think that (1) and (2) are possible in worlds where get some parts of alignment right, but overall are pretty unlikely: it’s a very narrow band of not-quite-alignment that gets you there. So probably if I cared about this a lot I’d focus more on (4) than (1) and (2).
Which of course gets me to why I’m writing this up, since that seems like a good message for people to pick up. Though I expect it to be quite difficult to effectively communicate this very broadly.
Disagree. I’m in favor of (2) because I think that what you call a “tyranny of the present” makes perfect sense. Why would the people of the present not maximize their utility functions, given that it’s the rational thing for them to do by definition of “utility function”? “Because utilitarianism” is a nonsensical answer IMO. I’m not a utilitarian. If you’re a utilitarian, you should pay for your utilitarianism out of your own resource share. For you to demand that I pay for your utilitarianism is essentially a defection in the decision-theoretic sense, and would incentivize people like me to defect back.
As to problem (2.b), I don’t think it’s a serious issue in practice because time until singularity is too short for it to matter much. If it was, we could still agree on a cooperative strategy that avoids a wasteful race between present people.
Even if you don’t personally value other people, if you’re willing to step behind the veil of ignorance with respect to whether you’ll be an early person or a late person, it’s clearly advantageous before you know which one you’ll be to not allocate all the resources to the early people.
First, I said I’m not a utilitarian, I didn’t say that I don’t value other people. There’s a big difference!
Second, I’m not willing to step behind that veil of ignorance. Why should I? Decision-theoretically, it can make sense to argue “you should help agent X because in some counterfactual, agent X would be deciding whether to help you using similar reasoning”. But, there might be important systematic differences between early people and late people (for example, because late people are modified in some ways compared to the human baseline) which break the symmetry. It might be a priori improbable for me to be born as a late person (and still be me in the relevant sense) or for a late person to be born in our generation[1].
Moreover, if there is a valid decision-theoretic argument to assign more weight to future people, then surely a superintelligent AI acting on my behalf would understand this argument and act on it. So, this doesn’t compel me to precommit to a symmetric agreement with future people in advance.
There is a stronger case for intentionally creating and giving resources to people who are early in counterfactual worlds. At least, assuming people have meaningful preferences about the state of never-being-born.
If a future decision is to shape the present, we need to predict it.
The decision-theoretic strategy “Figure out where you are, then act accordingly.” is merely an approximation to “Use the policy that leads to the multiverse you prefer.”. You *can* bring your present loyalties with you behind the veil, it might just start to feel farcically Goodhartish at some point.
There are of course no probabilities of being born into one position or another, there are only various avatars through which your decisions affect the multiverse. The closest thing to probabilities you’ll find is how much leverage each avatar offers: The least wrong probabilistic anthropics translates “the effect of your decisions through avatar A is twice as important as through avatar B” into “you are twice as likely to be A as B”.
So if we need probabilities of being born early vs. late, we can compare their leverage. We find:
Quantum physics shows that the timeline splits a bazillion times a second. So each second, you become a bazillion yous, but the portions of the multiverse you could first-order impact are divided among them. Therefore, you aren’t significantly more or less likely to find yourself a second earlier or later.
Astronomy shows that there’s a mazillion stars up there. So we build a Dyson sphere and huge artificial womb clusters, and one generation later we launch one colony ship at each star. But in that generation, the fate of the universe becomes a lot more certain, so we should expect to find ourselves before that point, not after.
Physics shows that several constants are finely tuned to support organized matter. We can infer that elsewhere, they aren’t. Since you’d think that there are other, less precarious arrangements of physical law with complex consequences, we can also moderately update towards that very precariousness granting us unusual leverage about something valuable in the acausal marketplace.
History shows that we got lucky during the Cold War. We can slightly update towards:
Current events are important.
Current events are more likely after a Cold War.
Nuclear winter would settle the universe’s fate.
The news show that ours is the era of inadequate AI alignment theory. We can moderately update towards being in a position to affect that.
When you start diverting significant resources away from #1, you’ll probably discover that the definition of “aligned” is somewhat in contention.
i feel like (2)/(3) is about “what does (the altruistic part of) my utility function want?” and 4 is “how do i decision-theoretically maximize said utility function?”. they’re different layers, and ultimately it’s (2)/(3) we want to maximize, but maximizing (2)/(3) entails allocating some of the future lightcore to (4).
A couple of thoughts:
I think that (3) does create strong incentives right now—at least for anyone who assumes [without any special prejudice given to existing people] amounts to [and it’s fine to disassemble everyone who currently exists if it’s the u/v/h/g/etc maximising policy]. This seems probable to me, though not entirely clear (I’m not an optimal configuration, and smoothly, consciousness-preservingly transitioning me to something optimal seems likely to take more resources than unceremoniously recycling me).
Incentives now include:
Prevent (3) happening.
To the extent that you expect (3) and are selfish, live for the pre-(3) time interval, for (3) will bring your doom.
On (4), “This solution obviously creates the best incentives for current agents” seems badly mistaken unless I’m misunderstanding you.
Something in this spirit would need to be based on a notion of [expected social value], not on actual contributions, since in the cases where we die we don’t get to award negative points.
For example, suppose my choice is between:
A: {90% chance doom for everyone; 10% I save the world}
B: {85% chance doom for everyone; 15% someone else saves the world}
To the extent that I’m selfish, and willing to risk some chance of death for greater control over the future, I’m going to pick A under (4).
The more selfish, reckless and power-hungry I am, and the more what I want deviates from that most people want, the more likely I am to actively put myself in position to take an A-like action.
Moreover, if the aim is to get ideal incentives, it seems unavoidable to have symmetry and include punishments rather than only [you don’t get many resources]. Otherwise the incentive is to shoot for huge magnitude of impact, without worrying much about the sign, since no-one can do worse than zero resources.
If correct incentives were the only desideratum, I don’t see how we’d avoid [post-singularity ‘hell’ (with some probability) for those who’re reckless with AGI].
For any nicer approach I think we’d either be incenting huge impact with uncertain sign, or failing to incent large sacrifice in order to save the world.
Perhaps the latter is best??
I.e. cap the max resources for any individual at a fairly low level, so that e.g. [this person was in the top percentile of helpfulness] and [this person saved the world] might get you about the same resource allocation.
It has the upsides both of making ‘hell’ less necessary, and of giving a lower incentive to overconfident people with high-impact schemes. (but still probably incents particularly selfish people to pick A over B)
(some very mild spoilers for yudkowsky’s planecrash glowfic (very mild as in this mostly does not talk about the story, but you could deduce things about where the story goes by the fact that characters in it are discussing this))
[edit: links in spoiler tags are bugged. in the spoiler, “speculates about” should link to here and “have the stance that” to here]
“The Negative stance is that everyone just needs to stop calculating how to pessimize anybody else’s utility function, ever, period. That’s a simple guideline for how realness can end up mostly concentrated inside of events that agents want, instead of mostly events that agents hate.”
“If at any point you’re calculating how to pessimize a utility function, you’re doing it wrong. If at any point you’re thinking about how much somebody might get hurt by something, for a purpose other than avoiding doing that, you’re doing it wrong.)”
i think this is a pretty solid principle. i’m very much not a fan of anyone’s utility function getting pessimized.
so pessimising a utility function is a bad idea. but we can still produce correct incentive gradients in other ways! for example, we could say that every moral patient starts with 1 unit of utility function handshake, but if you destroy the world you lose some of your share. maybe if you take actions that cause ⅔ of timelines to die, you only get ⅓ units of utility function handshake, and the more damage you do the less handshake you get.
it never gets into the negative, that way we never go out of our way to pessimize someone’s utility function; but it does get increasingly close to 0.
(this isn’t necessarily a scheme i’m committed to, it’s just an idea i’ve had for a scheme that provides the correct incentives for not destroying the world, without having to create hells / pessimize utility functions)
Hmmm, I don’t think that kind of thing is going to give correct world-saving incentives for the selfish part of people (unless failing to save the world counts as destroying it—in which case almost everyone is going to get approximately no influence).
More fundamentally, I don’t think it works out in this kind of case due to logical uncertainty.
If I’m uncertain about a particular plan, and my estimate is {80% everyone dies; 20% I save the world}, that’s not {in 80% of timelines everyone dies; in 20% of timelines I save the world}.
It’s closer to [there’s an 80% chance that {in ~99% of timelines everyone dies}; there’s a 20% chance that {in ~99% of timelines I save the world}].
So, conditional on my saving the world in some timeline by taking some action, I saved the world in most timelines where I took that action and would get a load of influence. This won’t disincentivize risky gambles for selfish/power-hungry people. (at least of the form [let’s train this model and see what happens] - most of the danger there being a logical uncertainty thing)
I think influence would need to be based on expected social value given the ‘correct’ level of logical uncertainty—probably something like [what (expected value | your action) is justified by your beliefs, and valid arguments you’d make for them based on information you have].
Or at least some subjective perspective seems to be necessary—and something that doesn’t give more points for overconfident people.