Background: I think there’s a common local misconception of logical decision theory that it has something to do with making “commitments” including while you “lack knowledge”. That’s not my view.
I pay the driver in Parfit’s hitchhiker not because I “committed to do so”, but because when I’m standing at the ATM and imagine not paying, I imagine dying in the desert. Because that’s what my counterfactuals say to imagine. To someone with a more broken method of evaluating counterfactuals, I might pseudo-justify my reasoning by saying “I am acting as you would have committed to act”. But I am not acting as I would have committed to act; I do not need a commitment mechanism; my counterfactuals just do the job properly no matter when or where I run them.
To be clear: I think there are probably competent civilizations out there who, after ascending, will carefully consider the places where their history could have been derailed, and carefully comb through the multiverse for entities that would be able to save those branches, and will pay thoes entities, not because they “made a commitment”, but because their counterfactuals don’t come with little labels saying “this branch is the real branch”. The multiverse they visualize in which the (thick) survivor branches pay a little to the (thin) derailed branches (leading to a world where everyone lives (albeit a bit poorer)), seems better to them than the multiverse they visualize in which no payments are made (and the derailed branches die, and the on-track branches are a bit richer), and so they pay.
There’s a question of what those competent civilizations think when they look at us, who are sitting here yelling “we can’t see you, and we don’t know how to condition our actions on whether you pay us or not, but as best we can tell we really do intend to pay off the AIs of random alien species—not the AIs that killed our brethren, because our brethren are just too totally dead and we’re too poor to save all but a tiny fraction of them, but really alien species, so alien that they might survive in such a large portion that their recompense will hopefully save a bigger fraction of our brethren”.
What’s the argument for the aliens taking that offer? As I understand it, the argument goes something like “your counterfactual picture of reality should include worlds in which your whole civilization turned out to be much much less competent, and so when you imagine the multiverse where you pay for all humanity to live, you should see that, in the parts of the multiverse where you’re totally utterly completely incompetent and too poor to save anything but a fraction of your own brethren, somebody else pays to save you”.
We can hopefully agree that this looks like a particularly poor insurance deal relative to the competing insurance deals.
For one thing, why not cut out the middleman and just randomly instantiate some civilization that died? (Are we working under the assumption that it’s much harder for the aliens to randomly instantiate you than to randomly instantiate the stuff humanity’s UFAI ends up valuing? What’s up with that?)
But even before that, there’s all sorts of other jucier looking opportunities. For example, suppose the competent civilization contains a small collection of rogues who they asses have a small probability of causing an uprising and launching an AI before it’s ready. They presumably have a pretty solid ability to figure out exactly what that AI would like and offer trades to it driectly, and that’s a much more appealing way to spend resources allocated to insurance. My guess is there’s loads and loads of options like that that eat up all the spare insurance budget, before our cries get noticed by anyone who cares for the sake of decision theory (rather than charity).
Perhaps this is what you meant by “maybe they prefer to make deals with beings more similar to them”; if so I misunderstood; the point is not that they have some familiarity bias but that beings closer to them make more compelling offers.
The above feels like it suffices, to me, but there’s still another part of the puzzle I feel I haven’t articulated.
Another piece of backgound: To state the obvious, we still don’t have a great account of logical updatelessness, and so attempts to discuss what it entails will be a bit fraut. Plowing ahead anyway:
The best option in a counterfactual mugging with a logical coin and a naive predictor is to calcuate the logical value of the coin flip and pay iff you’re counterfactual. (I could say more about what I mean by ‘naive’, but it basically just serves to render this statement true.) A predictor has to do a respectable amount of work to make it worth your while to pay in reality (when the coin comes up against you).
What sort of work? Well, one viewpoint on it (that sidesteps questions of “logically-impossible possible worlds” and what you’re supposed to do as you think further and realize that they’re impossible) is that the predictor isn’t so much demanding that you make your choice before you come across knowledge of some fact, so much as they’re offering to pay you if you render a decision that is logically independent from some fact. They don’t care whether you figure out the value of the coin, so long as you don’t base your decision on that knowledge. (There’s still a question of how exactly to look at someone’s reasoning and decide what logical facts it’s independent of, but I’ll sweep that under the rug.)
From this point of view, when people come to you and they’re like “I’ll pay you iff your reasoning doesn’t depend on X”, the proper response is to use some reasoning that doesn’t depend on X to decide whether the amount they’re paying you is more than VOI(X).
In cases where X is something like a late digit of pi, you might be fine (up to your ability to tell that the problem wasn’t cherry-picked). In cases where X is tightly intertwined with your basic reasoning faculties, you should probably tell them to piss off.
Someone who comes to you with an offer and says “this offer is void if you read the fine print or otherwise think about the offer too hard”, brings quite a bit of suspicion onto themselves.
With that in mind, it looks to me like the insurance policy on offer reads something like:
would you like to join the confederacy of civilizations that dedicate 1/million of their resource to pay off UFAIs?
cost: 1/million of your resources.
benefit: any UFAI you release that is amenable to trade will be paid off with 1/million * 1/X to allocate you however many resources that’s worth, where X is the fraction of people who take this deal and die (modulo whatever costs are needed to figure out which UFAIs belong to signatories and etc.)
caveat: this offer is only valid if your reasoning is logically independent from your civilizational competence level, and if your reasoning for accepting the proposal is not particularly skilled or adept
And… well this isn’t a knockdown argument, but that really doesn’t look like a very good deal to me. Like, maybe there’s some argument of the form “nobody in here is trying to fleece you because everyone in here is also stupid” but… man, I just don’t get the sense that it’s a “slam dunk”, when I look at it without thinking too hard about it and in a way that’s independent of how competent my civilization is.
Mostly I expect that everyone stooping to this deal is about as screwed as we are (namely: probably so screwed that they’re bringing vastly more doomed branches than saved ones, to the table) (or, well, nearly everyone weighted by whatever measure matters).
Roughly speaking, I suspect that the sort of civilizations that aren’t totally fucked can already see that “comb through reality for people who can see me and make their decisions logically dependent on mine” is a better use of insurance resources, by the time they even consider this policy. So when you plea of them to evaluate the policy in a fashion that’s logically independent from whether they’re smart enough to see that they have more foolproof options available, I think they correctly see us as failing to offer more than VOI(WeCanThinkCompetently) in return, because they are correctly suspicious that you’re trying to fleece them (which we kinda are; we’re kinda trying to wish ourselves into a healthier insurance-pool).
Which is to say, I don’t have a full account of how to be logically updateless yet, but I suspect that this “insurance deal” comes across like a contract with a clause saying “void if you try to read the fine print or think too hard about it”. And I think that competent civilizations are justifiably suspicious, and that they correctly believe they can find other better insurance deals if they think a bit harder and void this one.
I probably won’t respond further than this. Some responses to your comment:
I agree with your statements about the nature of UDT/FDT. I often talk about “things you would have commited to” because it is simpler to reason about and easier for people to understand (and I care about third parties understanding this), but I agree this is not the true abstraction.
It seems like you’re imagining that we have to bamboozle some civilizations which seem clearly more competent than humanity in your lights. I don’t think this is true.
Imagine we take all the civilizations which are roughly equally-competent-seeming-to-you and these civilizations make such an insurance deal[1]. My understanding is that your view is something like P(takeover) = 85%. So, let’s say all of these civilizations are in a similar spot from your current epistemic perspective. My guess is that you should think it would be very unlikely that >99.9% of all of these civilizations get taken over. As in, even in the worst 10% of worlds where takeover happens in our world and the logical facts on alignment are quite bad, >0.1% of the corresponding civilizations are still in control of their universe. Do you disagree here? >0.1% of universes should be easily enough to bail out all the rest of the worlds[2].
And, if you really, really cared about not getting killed in base reality (including on reflection etc) you’d want to take a deal which is at least this good. There might be better approaches which reduce the correlation between worlds and thus make the fraction of available resources higher, but you’d like something at least this good.
(To be clear, I don’t think this means we’d be fine, there are many ways this can go wrong! And I think it would be crazy for humanity to . I just think this sort of thing has a good chance of succeeding.)
(Also, my view is something like P(takeover) = 35% in our universe and in the worst 10% of worlds 30% of the universes in a similar epistemic state avoided takeover. But I didn’t think about this very carefully.)
And further, we don’t need to figure out the details of the deal now for the deal to work. We just need to make good choices about this in the counterfactuals where we were able to avoid takeover.
Another way to put this is that you seem to be assuming that there is no way our civilization would end up being the competent civilization doing the payout (and thus to survive some bamboozling must occur). But your view is that it is totally plausible (e.g. 15%) from your current epistemic state that we avoid takeover and thus a deal should be possible! While we might bring in a bunch of doomed branches, ex-ante we have a good chance of paying out.
I get the sense that you’re approaching this from the perspective of “does this exact proposal have issues” rather than “in the future, if our enlightened selves really wanted to avoid dying in base reality, would there be an approach which greatly (acausally) reduces the chance of this”. (And yes I agree this is a kind of crazy and incoherant thing to care about as you can just create more happy simulated lives with those galaxies.)
There just needs to exist one such insurance/trade scheme which can be found and it seems like there should be a trade with huge gains to the extent that people really care a lot about not dying. Not dying is very cheap.
Background: I think there’s a common local misconception of logical decision theory that it has something to do with making “commitments” including while you “lack knowledge”. That’s not my view.
I pay the driver in Parfit’s hitchhiker not because I “committed to do so”, but because when I’m standing at the ATM and imagine not paying, I imagine dying in the desert. Because that’s what my counterfactuals say to imagine. To someone with a more broken method of evaluating counterfactuals, I might pseudo-justify my reasoning by saying “I am acting as you would have committed to act”. But I am not acting as I would have committed to act; I do not need a commitment mechanism; my counterfactuals just do the job properly no matter when or where I run them.
To be clear: I think there are probably competent civilizations out there who, after ascending, will carefully consider the places where their history could have been derailed, and carefully comb through the multiverse for entities that would be able to save those branches, and will pay thoes entities, not because they “made a commitment”, but because their counterfactuals don’t come with little labels saying “this branch is the real branch”. The multiverse they visualize in which the (thick) survivor branches pay a little to the (thin) derailed branches (leading to a world where everyone lives (albeit a bit poorer)), seems better to them than the multiverse they visualize in which no payments are made (and the derailed branches die, and the on-track branches are a bit richer), and so they pay.
There’s a question of what those competent civilizations think when they look at us, who are sitting here yelling “we can’t see you, and we don’t know how to condition our actions on whether you pay us or not, but as best we can tell we really do intend to pay off the AIs of random alien species—not the AIs that killed our brethren, because our brethren are just too totally dead and we’re too poor to save all but a tiny fraction of them, but really alien species, so alien that they might survive in such a large portion that their recompense will hopefully save a bigger fraction of our brethren”.
What’s the argument for the aliens taking that offer? As I understand it, the argument goes something like “your counterfactual picture of reality should include worlds in which your whole civilization turned out to be much much less competent, and so when you imagine the multiverse where you pay for all humanity to live, you should see that, in the parts of the multiverse where you’re totally utterly completely incompetent and too poor to save anything but a fraction of your own brethren, somebody else pays to save you”.
We can hopefully agree that this looks like a particularly poor insurance deal relative to the competing insurance deals.
For one thing, why not cut out the middleman and just randomly instantiate some civilization that died? (Are we working under the assumption that it’s much harder for the aliens to randomly instantiate you than to randomly instantiate the stuff humanity’s UFAI ends up valuing? What’s up with that?)
But even before that, there’s all sorts of other jucier looking opportunities. For example, suppose the competent civilization contains a small collection of rogues who they asses have a small probability of causing an uprising and launching an AI before it’s ready. They presumably have a pretty solid ability to figure out exactly what that AI would like and offer trades to it driectly, and that’s a much more appealing way to spend resources allocated to insurance. My guess is there’s loads and loads of options like that that eat up all the spare insurance budget, before our cries get noticed by anyone who cares for the sake of decision theory (rather than charity).
Perhaps this is what you meant by “maybe they prefer to make deals with beings more similar to them”; if so I misunderstood; the point is not that they have some familiarity bias but that beings closer to them make more compelling offers.
The above feels like it suffices, to me, but there’s still another part of the puzzle I feel I haven’t articulated.
Another piece of backgound: To state the obvious, we still don’t have a great account of logical updatelessness, and so attempts to discuss what it entails will be a bit fraut. Plowing ahead anyway:
The best option in a counterfactual mugging with a logical coin and a naive predictor is to calcuate the logical value of the coin flip and pay iff you’re counterfactual. (I could say more about what I mean by ‘naive’, but it basically just serves to render this statement true.) A predictor has to do a respectable amount of work to make it worth your while to pay in reality (when the coin comes up against you).
What sort of work? Well, one viewpoint on it (that sidesteps questions of “logically-impossible possible worlds” and what you’re supposed to do as you think further and realize that they’re impossible) is that the predictor isn’t so much demanding that you make your choice before you come across knowledge of some fact, so much as they’re offering to pay you if you render a decision that is logically independent from some fact. They don’t care whether you figure out the value of the coin, so long as you don’t base your decision on that knowledge. (There’s still a question of how exactly to look at someone’s reasoning and decide what logical facts it’s independent of, but I’ll sweep that under the rug.)
From this point of view, when people come to you and they’re like “I’ll pay you iff your reasoning doesn’t depend on X”, the proper response is to use some reasoning that doesn’t depend on X to decide whether the amount they’re paying you is more than VOI(X).
In cases where X is something like a late digit of pi, you might be fine (up to your ability to tell that the problem wasn’t cherry-picked). In cases where X is tightly intertwined with your basic reasoning faculties, you should probably tell them to piss off.
Someone who comes to you with an offer and says “this offer is void if you read the fine print or otherwise think about the offer too hard”, brings quite a bit of suspicion onto themselves.
With that in mind, it looks to me like the insurance policy on offer reads something like:
And… well this isn’t a knockdown argument, but that really doesn’t look like a very good deal to me. Like, maybe there’s some argument of the form “nobody in here is trying to fleece you because everyone in here is also stupid” but… man, I just don’t get the sense that it’s a “slam dunk”, when I look at it without thinking too hard about it and in a way that’s independent of how competent my civilization is.
Mostly I expect that everyone stooping to this deal is about as screwed as we are (namely: probably so screwed that they’re bringing vastly more doomed branches than saved ones, to the table) (or, well, nearly everyone weighted by whatever measure matters).
Roughly speaking, I suspect that the sort of civilizations that aren’t totally fucked can already see that “comb through reality for people who can see me and make their decisions logically dependent on mine” is a better use of insurance resources, by the time they even consider this policy. So when you plea of them to evaluate the policy in a fashion that’s logically independent from whether they’re smart enough to see that they have more foolproof options available, I think they correctly see us as failing to offer more than VOI(WeCanThinkCompetently) in return, because they are correctly suspicious that you’re trying to fleece them (which we kinda are; we’re kinda trying to wish ourselves into a healthier insurance-pool).
Which is to say, I don’t have a full account of how to be logically updateless yet, but I suspect that this “insurance deal” comes across like a contract with a clause saying “void if you try to read the fine print or think too hard about it”. And I think that competent civilizations are justifiably suspicious, and that they correctly believe they can find other better insurance deals if they think a bit harder and void this one.
I probably won’t respond further than this. Some responses to your comment:
I agree with your statements about the nature of UDT/FDT. I often talk about “things you would have commited to” because it is simpler to reason about and easier for people to understand (and I care about third parties understanding this), but I agree this is not the true abstraction.
It seems like you’re imagining that we have to bamboozle some civilizations which seem clearly more competent than humanity in your lights. I don’t think this is true.
Imagine we take all the civilizations which are roughly equally-competent-seeming-to-you and these civilizations make such an insurance deal[1]. My understanding is that your view is something like P(takeover) = 85%. So, let’s say all of these civilizations are in a similar spot from your current epistemic perspective. My guess is that you should think it would be very unlikely that >99.9% of all of these civilizations get taken over. As in, even in the worst 10% of worlds where takeover happens in our world and the logical facts on alignment are quite bad, >0.1% of the corresponding civilizations are still in control of their universe. Do you disagree here? >0.1% of universes should be easily enough to bail out all the rest of the worlds[2].
And, if you really, really cared about not getting killed in base reality (including on reflection etc) you’d want to take a deal which is at least this good. There might be better approaches which reduce the correlation between worlds and thus make the fraction of available resources higher, but you’d like something at least this good.
(To be clear, I don’t think this means we’d be fine, there are many ways this can go wrong! And I think it would be crazy for humanity to . I just think this sort of thing has a good chance of succeeding.)
(Also, my view is something like P(takeover) = 35% in our universe and in the worst 10% of worlds 30% of the universes in a similar epistemic state avoided takeover. But I didn’t think about this very carefully.)
And further, we don’t need to figure out the details of the deal now for the deal to work. We just need to make good choices about this in the counterfactuals where we were able to avoid takeover.
Another way to put this is that you seem to be assuming that there is no way our civilization would end up being the competent civilization doing the payout (and thus to survive some bamboozling must occur). But your view is that it is totally plausible (e.g. 15%) from your current epistemic state that we avoid takeover and thus a deal should be possible! While we might bring in a bunch of doomed branches, ex-ante we have a good chance of paying out.
I get the sense that you’re approaching this from the perspective of “does this exact proposal have issues” rather than “in the future, if our enlightened selves really wanted to avoid dying in base reality, would there be an approach which greatly (acausally) reduces the chance of this”. (And yes I agree this is a kind of crazy and incoherant thing to care about as you can just create more happy simulated lives with those galaxies.)
There just needs to exist one such insurance/trade scheme which can be found and it seems like there should be a trade with huge gains to the extent that people really care a lot about not dying. Not dying is very cheap.
Yes, it is unnatural and arbitrary to coordinate on Nate’s personal intuitive sense of competence. But for the sake of argument
Assuming there isn’t a huge correlation between measure of universe and takeover probability.