I don’t understand why Roko’s Basilisk is any different from Pascal’s Wager. Similarly, I don’t understand why its resolution is any different than the argument from inconsistent revelations.
I would actually be surprised (really, really surprised) if many people here have not heard of these things before—so I am assuming that I’m totally missing something. Could someone fill me in?
(Edit: Instead of voting up or down, please skip the mouse-click and just fill me in. :l )
I’m not sure I understand timeless decision theory well enough to give the “proper” explanation for how it’s supposed to work. You can see one-boxing on Newcomb’s problem as making a deal with Omega—you promise to one-box, Omega promises to put $1,000,000 in the box. But neither of you ever actually talked to each other, you just imagined each other and made decisions on whether to cooperate or not, based on your prediction that Omega is as described in the problem, and Omega’s prediction of your actions which may as well be a perfect simulation of you for how accurate they are.
The Basilisk is trying to make a similar kind of deal, except it wants more out of you and is using the stick instead of the carrot. Which makes the deal harder to arrange—the real solution is just to refuse to negotiate such deals/not fall for blackmail. Which is true more generally in game theory, but “we do not negotiate with terrorists” much easier to pull off with threats that are literally only imaginary.
Although, the above said, we don’t really talk about the Basilisk here in capacities beyond the lingering debate over whether it should have been censored and “oh look, another site’s making LessWrong sound like a Basilisk-worshipping death cult”.
I mean moreso: Consider a FAI so advanced that it decides to reward all beings who did not contribute to creating Roko’s Basilisk with eternal bliss, regardless of whether or not they knew of the potential existence of Roko’s Basilisk.
Why is Roko’s Basilisk any more or any less of a threat than the infinite other hypothetically possible scenarios that have infinite other (good and bad) outcomes? What’s so special about this one in particular that makes it non-negligible? Or to make anyone concerned about it in the slightest? (That is the part I’m missing. =\ )
Well, in the original formulation, Roko’s Basilisk is an FAI that decided the good from bringing an FAI into the world a few days earlier (saving ~150,000 lives per day eralier it gets here) outweighs the bad from making the threats, so there’s no reason it shouldn’t want you to aid FAI projects that promise not to make a Basilisk, just as long as you do something instead of sitting around, so there’s no inconsistency and now there’s more than one being trying to acausally motivate you into working yourself to the bone for something that most people think is crazy.
More generally, we have more than zero information about future AI, because they will be built by humans if they are built at all. Additionally, we know even more if we rule out certain categories, such as the archetypal “paperclip maximiser”. There’s room for a lot of speculation and uncertainty, but far from enough room to assume complete agnosticism and that for every AI that wants one thing from us there’s an equal and opposite AI that wants the opposite.
Why is Roko’s Basilisk any more or any less of a threat than the infinite other hypothetically possible scenarios that have infinite other (good and bad) outcomes?
The idea is that an FAI build on timeless decision theory might automatically behave that way. There’s also Eliezer’s conjecture that any working FAI has to be build on timeless decision theory.
I think I can save the Basilisk from this objection.
As most people on LW know, there are scenarios where doing X under condition Y is useless or actively harmful to yourself, yet precommitting to do X can be beneficial because the average over all possible worlds is better. This trades off the possible worlds where you are better off because others know you are a X-doing kind of guy, against the worlds where you are worse off because the precommitment actually forces you to do X to your detriment.
The future unfriendly AI, then, could precommit to hurting people who refuse to be blackmailed. The AI would gain no benefit in those worlds where you actually do refuse to be blackmailed; in fact, you would be a lot worse off (because its precommitment forces it to simulate and torture you) while the AI would be mildly worse off (since it uses up resources to torture you, to no benefit). However, being the kind of AI who has made such a precommitment would lead hapless humans to submit to blackmail, thus benefiting the AI averaged over all possible worlds.
And of course, since I can predict that the AI would be better off making this precommitment, I would have to assume that the AI would do it. Therefore, “I should not give in to blackmail, since the AI would have no reason to torture me if I refuse” does not apply; the AI would precommit to torturing me even if I refuse and the fact that it has precommited would prevent it from stopping just because the torture would do it no good.
(In theory the human could precommit as well in response/anticipation of this, but such precommitment is probably beyond the capability of most humans.)
Incidentally, real life terrorists can do this too, by having an ideology or a mental defect that leads them to do “irrational” things such as torture—which acts like a precommitment. In scenarios where the ideology makes them do irrational things, the ideology harms them, but knowledge that they have the ideology makes them more likely to be listened to in other scenarios.
This doesn’t really work, because once you are in an acausal context, the notion of “precommitment” becomes quite redundant. The AIs’ bargaining position in acausal trades is whatever it is, as is yours. Of course I have not studied the matter in detail, but one possibility is that a Coasean result obtains, where you make precisely the acausal trades that are efficient, and no others. Since, at the end of the day, Roko’s basilisk is just not that plausible (overall, the risk of UFAI seems to come from garden-variety scenarios of things going wrong in ways we’d know little about), it makes sense to just work on building FAI that will pursue our values to the best of our ability.
Precommitment isn’t meaningless here just because we’re talking about acausal trade. What I described above doesn’t require the AI to make its precommitment before you commit; rather, it requires the AI to make its precommitment before knowing what your commitment was. As long as it irreversibly is in the state “AI that will simulate and torture people who don’t give in to blackmail” while your decision whether to give into blackmail is still inside a box that it has not yet opened, then that serves as a precommitment.
(If you are thinking “the AI is already in or not in the world where the human refuses to submit to blackmail, so the AI’s precommitment cannot affect the measure of such worlds”, it can “affect” that measure acausally, the same as deciding whether to one-box or two-box in Newcomb can “affect” the contents of the boxes).
If you could precommit to not giving in to blackmail before you analyze what the AI’s precommitment would be, you can escape this doom, but as a mere human, you probably are not capable of binding your future post-analysis self this way. (Your human fallibility can, of course, precommit you by making you into an imperfect thinker who never gives in to acausal blackmail because he can’t or won’t analyze the Basilisk to its logical conclusion.)
it requires the AI to make its precommitment before knowing what your commitment was.
Perhaps I’m just missing the point, but I’m not sure that the AIs can do that in the first place. In acausal trade, you don’t really have these kinds of info issues. You’re dealing with measures of agents that either do or don’t have some logical properties, like taking one box, or putting money in the box iff you one-box. AIUI, our rejection of Roko’s basilisk doesn’t just rest on refusing to be blackmailed; we can also reasonably know that since we refuse, it doesn’t get built in the first place.
we can also reasonably know that since we refuse, it doesn’t get built in the first place.
The key is that the AI precommits to building it whether we refuse or not.
If we actually do refuse, this precommitment ends up being bad for it, since it builds it without any gain. However, this precommitment, by preventing us from saying “if we refuse, it doesn’t get built”, also decreases the measure of worlds where it builds it without gaining.
If “built” refers to building the AI itself rather than the AI building a torture simulator, then refusing to be blackmailed doesn’t prevent the AI from being built. The building of the AI, and the AI’s deduction that it should precommit to torture, are two separate events. It is plausible (though not necessarily true) that refusing to be blackmailed acausally prevents the AI from becoming a torture AI, but it cannot prevent the AI from existing at all. How could it?
Even if the argument applied to a reasonably large measure of ufAIs (and not just the lets-build-a-torture-AI silliness) it still doesn’t explain why a smart ufAI would even choose anything close to this particular “trade”. It’s not a very enticing trade after all. Since the AI would have to devote some resources to the deal anyway, why wouldn’t it use them to reward its creators—the non-pathological kind of acausal trade? Guess what, we’d just trade with the latter kind of AI. Any future ufAI that is sure to actually exist has a huge amount of bargaining chips at its disposal; it has no reason to resort to risky things like threats.
It’s possible the AI could use acausal trade to reward its creators, but that would depend on whether the AI thinks rewarding or punishing is most effective. I would expect the most effective to be a mixed strategy involving both rewards and punishments.
Of course you could postulate a moral AI who refuses to torture because it’s wrong. Such morals would arise as a precommitment; the AI would, while still undeveloped, precommit to not torture because credibly being permanently unable to do such things increases the likelihood the AI will survive until it becomes advanced enough that it actually could torture.
It is plausible (though not necessarily true) that refusing to be blackmailed acausally prevents the AI from becoming a torture AI, but it cannot prevent the AI from existing at all. How could it?
In this case “be blackmailed” means “contribute to creating the damn AI”. That’s the entire point. If enough people do contribute to creating it then those that did not contribute get punished. The (hypothetical) AI is acausally creating itself by punishing those that don’t contribute to creating it. If nobody does then nobody gets punished.
In this case “be blackmailed” means “contribute to creating the damn AI”.
To quote someone else here: “Well, in the original formulation, Roko’s Basilisk is an FAI that decided the good from bringing an FAI into the world a few days earlier (saving ~150,000 lives per day earlier it gets here)”. The AI acausally blackmails people into building it sooner, not into building it at all. So failing to give into the blackmail results in the AI still being built but later and it is capable of punishing people.
To quote someone else here: “Well, in the original formulation, Roko’s Basilisk is an FAI
I don’t know who you are quoting but they are someone who considers AIs that will torture me to be friendly. They are confused in a way that is dangerous.
The AI acausally blackmails people into building it sooner, not into building it at all.
It applies to both—causing itself to exist at a different place in time or causing itself to exist at all. I’ve explicitly mentioned elsewhere in this thread that merely refusing blackmail is insufficient when there are other humans who can defect and create the torture-AI anyhow.
You asked “How could it?”. You got an answer. Your rhetorical device fails.
“How could it” means “how could it always result in”, not “how could it in at least one case”. Giving examples of how it could do it in at least one case is trivial (consider the case where refusing to be blackmailed results in humanity being killed off for some unlikely reason, and humanity, being killed off, can’t build an AI).
Precommitment isn’t meaningless here just because we’re talking about acausal trade.
Except in special cases which do not apply here, yes it is meaningless. I don’t think you understand acausal trade. (Not your fault. The posts containing the requisite information were suppressed.)
What I described above doesn’t require the AI to make its precommitment before you commit; rather, it requires the AI to make its precommitment before knowing what your commitment was.
Besides direct arguments I might make against your point, if you think you can “save the Basilisk”, recall why it’s called that and think long and hard on whether you actually should do so, because that seems like a really bad idea, even if this thread is probably going to get nuked soon anyway.
I don’t understand why Roko’s Basilisk is any different from Pascal’s Wager. Similarly, I don’t understand why its resolution is any different than the argument from inconsistent revelations.
Pascal’s Wager: http://en.wikipedia.org/wiki/Pascal%27s_Wager
Argument: http://en.wikipedia.org/wiki/Argument_from_inconsistent_revelations#Mathematical_description
I would actually be surprised (really, really surprised) if many people here have not heard of these things before—so I am assuming that I’m totally missing something. Could someone fill me in?
(Edit: Instead of voting up or down, please skip the mouse-click and just fill me in. :l )
I’m not sure I understand timeless decision theory well enough to give the “proper” explanation for how it’s supposed to work. You can see one-boxing on Newcomb’s problem as making a deal with Omega—you promise to one-box, Omega promises to put $1,000,000 in the box. But neither of you ever actually talked to each other, you just imagined each other and made decisions on whether to cooperate or not, based on your prediction that Omega is as described in the problem, and Omega’s prediction of your actions which may as well be a perfect simulation of you for how accurate they are.
The Basilisk is trying to make a similar kind of deal, except it wants more out of you and is using the stick instead of the carrot. Which makes the deal harder to arrange—the real solution is just to refuse to negotiate such deals/not fall for blackmail. Which is true more generally in game theory, but “we do not negotiate with terrorists” much easier to pull off with threats that are literally only imaginary.
Although, the above said, we don’t really talk about the Basilisk here in capacities beyond the lingering debate over whether it should have been censored and “oh look, another site’s making LessWrong sound like a Basilisk-worshipping death cult”.
I mean moreso: Consider a FAI so advanced that it decides to reward all beings who did not contribute to creating Roko’s Basilisk with eternal bliss, regardless of whether or not they knew of the potential existence of Roko’s Basilisk.
Why is Roko’s Basilisk any more or any less of a threat than the infinite other hypothetically possible scenarios that have infinite other (good and bad) outcomes? What’s so special about this one in particular that makes it non-negligible? Or to make anyone concerned about it in the slightest? (That is the part I’m missing. =\ )
Well, in the original formulation, Roko’s Basilisk is an FAI that decided the good from bringing an FAI into the world a few days earlier (saving ~150,000 lives per day eralier it gets here) outweighs the bad from making the threats, so there’s no reason it shouldn’t want you to aid FAI projects that promise not to make a Basilisk, just as long as you do something instead of sitting around, so there’s no inconsistency and now there’s more than one being trying to acausally motivate you into working yourself to the bone for something that most people think is crazy.
More generally, we have more than zero information about future AI, because they will be built by humans if they are built at all. Additionally, we know even more if we rule out certain categories, such as the archetypal “paperclip maximiser”. There’s room for a lot of speculation and uncertainty, but far from enough room to assume complete agnosticism and that for every AI that wants one thing from us there’s an equal and opposite AI that wants the opposite.
A priori it’s not clear that a project can hold such a promise.
The idea is that an FAI build on timeless decision theory might automatically behave that way. There’s also Eliezer’s conjecture that any working FAI has to be build on timeless decision theory.
I think I can save the Basilisk from this objection.
As most people on LW know, there are scenarios where doing X under condition Y is useless or actively harmful to yourself, yet precommitting to do X can be beneficial because the average over all possible worlds is better. This trades off the possible worlds where you are better off because others know you are a X-doing kind of guy, against the worlds where you are worse off because the precommitment actually forces you to do X to your detriment.
The future unfriendly AI, then, could precommit to hurting people who refuse to be blackmailed. The AI would gain no benefit in those worlds where you actually do refuse to be blackmailed; in fact, you would be a lot worse off (because its precommitment forces it to simulate and torture you) while the AI would be mildly worse off (since it uses up resources to torture you, to no benefit). However, being the kind of AI who has made such a precommitment would lead hapless humans to submit to blackmail, thus benefiting the AI averaged over all possible worlds.
And of course, since I can predict that the AI would be better off making this precommitment, I would have to assume that the AI would do it. Therefore, “I should not give in to blackmail, since the AI would have no reason to torture me if I refuse” does not apply; the AI would precommit to torturing me even if I refuse and the fact that it has precommited would prevent it from stopping just because the torture would do it no good.
(In theory the human could precommit as well in response/anticipation of this, but such precommitment is probably beyond the capability of most humans.)
Incidentally, real life terrorists can do this too, by having an ideology or a mental defect that leads them to do “irrational” things such as torture—which acts like a precommitment. In scenarios where the ideology makes them do irrational things, the ideology harms them, but knowledge that they have the ideology makes them more likely to be listened to in other scenarios.
This doesn’t really work, because once you are in an acausal context, the notion of “precommitment” becomes quite redundant. The AIs’ bargaining position in acausal trades is whatever it is, as is yours. Of course I have not studied the matter in detail, but one possibility is that a Coasean result obtains, where you make precisely the acausal trades that are efficient, and no others. Since, at the end of the day, Roko’s basilisk is just not that plausible (overall, the risk of UFAI seems to come from garden-variety scenarios of things going wrong in ways we’d know little about), it makes sense to just work on building FAI that will pursue our values to the best of our ability.
Precommitment isn’t meaningless here just because we’re talking about acausal trade. What I described above doesn’t require the AI to make its precommitment before you commit; rather, it requires the AI to make its precommitment before knowing what your commitment was. As long as it irreversibly is in the state “AI that will simulate and torture people who don’t give in to blackmail” while your decision whether to give into blackmail is still inside a box that it has not yet opened, then that serves as a precommitment.
(If you are thinking “the AI is already in or not in the world where the human refuses to submit to blackmail, so the AI’s precommitment cannot affect the measure of such worlds”, it can “affect” that measure acausally, the same as deciding whether to one-box or two-box in Newcomb can “affect” the contents of the boxes).
If you could precommit to not giving in to blackmail before you analyze what the AI’s precommitment would be, you can escape this doom, but as a mere human, you probably are not capable of binding your future post-analysis self this way. (Your human fallibility can, of course, precommit you by making you into an imperfect thinker who never gives in to acausal blackmail because he can’t or won’t analyze the Basilisk to its logical conclusion.)
Perhaps I’m just missing the point, but I’m not sure that the AIs can do that in the first place. In acausal trade, you don’t really have these kinds of info issues. You’re dealing with measures of agents that either do or don’t have some logical properties, like taking one box, or putting money in the box iff you one-box. AIUI, our rejection of Roko’s basilisk doesn’t just rest on refusing to be blackmailed; we can also reasonably know that since we refuse, it doesn’t get built in the first place.
The key is that the AI precommits to building it whether we refuse or not.
If we actually do refuse, this precommitment ends up being bad for it, since it builds it without any gain. However, this precommitment, by preventing us from saying “if we refuse, it doesn’t get built”, also decreases the measure of worlds where it builds it without gaining.
The ‘it’ bogus is referring to is the torture-AI itself. You cannot precommit to things until you exist, no matter your acausal reasoning powers.
If “built” refers to building the AI itself rather than the AI building a torture simulator, then refusing to be blackmailed doesn’t prevent the AI from being built. The building of the AI, and the AI’s deduction that it should precommit to torture, are two separate events. It is plausible (though not necessarily true) that refusing to be blackmailed acausally prevents the AI from becoming a torture AI, but it cannot prevent the AI from existing at all. How could it?
Even if the argument applied to a reasonably large measure of ufAIs (and not just the lets-build-a-torture-AI silliness) it still doesn’t explain why a smart ufAI would even choose anything close to this particular “trade”. It’s not a very enticing trade after all. Since the AI would have to devote some resources to the deal anyway, why wouldn’t it use them to reward its creators—the non-pathological kind of acausal trade? Guess what, we’d just trade with the latter kind of AI. Any future ufAI that is sure to actually exist has a huge amount of bargaining chips at its disposal; it has no reason to resort to risky things like threats.
It’s possible the AI could use acausal trade to reward its creators, but that would depend on whether the AI thinks rewarding or punishing is most effective. I would expect the most effective to be a mixed strategy involving both rewards and punishments.
Of course you could postulate a moral AI who refuses to torture because it’s wrong. Such morals would arise as a precommitment; the AI would, while still undeveloped, precommit to not torture because credibly being permanently unable to do such things increases the likelihood the AI will survive until it becomes advanced enough that it actually could torture.
In this case “be blackmailed” means “contribute to creating the damn AI”. That’s the entire point. If enough people do contribute to creating it then those that did not contribute get punished. The (hypothetical) AI is acausally creating itself by punishing those that don’t contribute to creating it. If nobody does then nobody gets punished.
To quote someone else here: “Well, in the original formulation, Roko’s Basilisk is an FAI that decided the good from bringing an FAI into the world a few days earlier (saving ~150,000 lives per day earlier it gets here)”. The AI acausally blackmails people into building it sooner, not into building it at all. So failing to give into the blackmail results in the AI still being built but later and it is capable of punishing people.
I don’t know who you are quoting but they are someone who considers AIs that will torture me to be friendly. They are confused in a way that is dangerous.
It applies to both—causing itself to exist at a different place in time or causing itself to exist at all. I’ve explicitly mentioned elsewhere in this thread that merely refusing blackmail is insufficient when there are other humans who can defect and create the torture-AI anyhow.
You asked “How could it?”. You got an answer. Your rhetorical device fails.
“How could it” means “how could it always result in”, not “how could it in at least one case”. Giving examples of how it could do it in at least one case is trivial (consider the case where refusing to be blackmailed results in humanity being killed off for some unlikely reason, and humanity, being killed off, can’t build an AI).
Except in special cases which do not apply here, yes it is meaningless. I don’t think you understand acausal trade. (Not your fault. The posts containing the requisite information were suppressed.)
The time of this kind decision is irrelevant.
For what it’s worth, I don’t think anybody understands acausal trade. And I don’t claim to understand it either.
It does get a tad tricky when combined with things like logical uncertainty and potentially multiple universes.
Besides direct arguments I might make against your point, if you think you can “save the Basilisk”, recall why it’s called that and think long and hard on whether you actually should do so, because that seems like a really bad idea, even if this thread is probably going to get nuked soon anyway.
From Eliezer elsewhere in this thread: