(Why) Does the Basilisk Argument fail?
As far as I can tell, the standard rebuttal to Roko’s idea is based on the strategy of simply ignoring acausal blackmail. But what if you consider the hypothetical situation that there is no blackmail and acausal deals involved, and the Basilisk scenario is how things just are? Simply put, if we accept the Computational Theory of Mind, we need to accept that our experiences could in principle be a simulation by another actor. Then, if you ever find yourself entertaining the question:
Q1: Should I do everything I can to ensure the creation/simulation of an observer that is asking itself the same question as I am now, and will suffer a horrible fate if it answers no?
Would the answer that maximizes your utility not be yes? The idea being that answering no opens up the possibility that you are an actor created by someone who answered yes, and thus, might end up suffering whatever horrible fate he might have implemented. This seems to be the argument that Roko put forth in his original post, and that I have not seen soundly refuted.
Of course, one could argue that there is no evidence for being in such a simulation instead of the real world, and thus that the commitment and ethical dilemma of answering yes is not worth it. If we assign equal probability to all worlds we might inhabit, this might indeed be convincing. However, I think a kind of anthropic argument, based on Bostroms Self-Sampling Assumption (SSA) could be constructed to show that we may indeed be more likely to be in such a simulation. Assume your your hypotheses to be
H0: “I am living in the real world, which is as it appears to be, i.e. the possible observers are the ≈ 7 billion humans on planet earth”.
H1: “The real world is dominated by actors who answered yes to Q1, and (possibly with the help of AGI) filled the universe with simulations of observers pondering Q1”.
Then, given the data
D: “I am an observer thinking about Q1”
And assuming that in the world as it appears to be in H0, only very few people, say, one in ten thousand, asked themselves this question Q1, while under H1, the universe is literally filled with observers thinking about Q1, we get
Pr[D|H0] = 0.0001
Pr[D|H1] ≈ 1
So, the Bayes factor would show very clear evidence in favor of H1. Of course, this would apply to any hypothetical model in which most observers would have exactly the same thoughts as we (and may indeed lead to solipsism), but H1 at least gives a convincing reason on why this should be the case.
So how can we avoid biting the bullet of answering yes to Q1, which seems like a very unattractive option, but possibly still better than being the only actor to answer no? Admittedly, I am quite new to the idea of anthropic reasoning, so my logic could be flawed. I would like to hear thoughts on this.
I think that RB fails because of human laziness – or more generally speaking, because human psychology can’t process acausal blackmails. Thus nobody change his-her investment in beneficial AI creation based on RB.
However, I met two (independent of each other and of RB-idea) people who hoped that future AI will prize them for their projects which increase probability of AI’s creation, which basically the same idea presented in more human language.
Thank you for your answer. I agree that human nature is a reason to believe that a RB-like scenario (especially one based on acausal blackmail) is less likely to happen. However, I was thinking more of a degenerate scenario similar to the one proposed in this comment. Just exchange the message coming from a text terminal with the fact that you are thinking about a Basilisk situation: a future superintelligence might have created many observers, some of whom think very much like you, but are less prone to believing in human laziness and more likely to support RB. Thus, if you consider answering no to Q1 (in other words, dismissing the Basilisk), you could see this as evidence that maybe H1 is still true, and you are just (unluckily) one of the simulations that will be punished. By this logic, it would be very advisable to actually answer yes (assuming you care more about your own utility than that of a copy of you).
Actually though, my anthropic argument might be flawed. If we think about it like in this post by Stuart Armstrong, we see that in both H0 and H1, there is exactly one observer that is me, personally (i.e. having experiences that I identify with), thus, the probability of me being in a RB-scenario should not be higher than being in the real world (or any other simulation) after all. But which way of thinking about anthropic probability is correct, in this case?
In real life, you can reverse blackmail by saying: “Blackmail is serious felony, an you could get one year in jail in US for blackmail, so now you have to pay me for not reporting the blackmail to the police” (I don’t recommend it in real life, as you both will be arrested, but such aggressive posture may stop the blackmail.)
The same way acausal blackmail by AI could be reversed: You can threaten the AI that you had precommited to create thousands other AI which will simulate all this setup, and will punish the AI if it tries to torture any simulated being. This could be used to make a random paperclipper to behave as a Benevolent AI and the idea was suggested by Rolf Nelson. I analysed it it details in the text.
That strategy might work as deterrence, although actually implementing it would still be ethically...suboptimal, as you would still need to harm simulated observers. Sure, they would be Rogue AIs instead of poor innocent humans, but in the end, you would be doing something rather similar to what you blame them for in the first place: creating intelligent observers with the explicit purpose of punishing them if they act the wrong way.
I haven’t seen anyone explore the simplest answer: ignore acausal blackmail for the same reason you ignore plain old causal blackmail. “fuck you, that’s why”.
If blackmail doesn’t work, then no rational agent will attempt it. Further, no rational agent will follow through on a threat if there’s no causal chain to benefit the agent.
This approach can be made a little more formal with FDT/LDT/TDT: being the sort of agent who robustly does not respond to blackmail maximises utility more than being the sort of agent who sometimes gives in to blackmail, because you will not wind up in situations where you’re being blackmailed.
As I said, that argument is actually the most commonly presented one. However, there is actually a causal chain that would benefit an agent, causing it to adopt a Basilisk strategy: namely, if it thinks it is itself a simulation and will get punished otherwise.
If Omega suggests you a blackmail, it calculated that there is some probability that you will accept it.
Knowing that you might accept it is different than following through if you don’t. Omega’s tricky that way.
Precisely. My argument was just that, depending on ones stance on anthropic reasoning, the fact that an actor is contemplating RB in the first place might already be an indication that he is in a simulation/being blackmailed in this way.
A few more reasons that the basilisk is not compelling:
1) Every decision you make has a “compared to what” implicit in it. A rational agent does NOT have Q1 as phrased. It’s _not_ ” Should I do everything I can to …”, but “what resources should I divert from my goals toward …”. And here, the answer is “none” very plausibly for all rational agents.
2) Do you know who worries most about being blackmailed in the future? Blackmailers. Any agent that believes this works is going to be to be driven to placate their own future simulation-enforcer, which probably doesn’t include wasting time on long-dead agents with no remaining causal power. Note: there is one path where this works with #1 to argue for the basilisk—if the agent is afraid that they’ll be punished for not punishing you. This could be part of a very long chain that ends with the heat death of the universe, rendering the fear ultimately unfounded.
3) All the arguments against Pascal’s wager apply just as well. Notably, it’s easy to imagine so many possible goals that you might be resurrected and punished for not pursuing. There’s no particular way to know which god to bet on—you’re going to lose no matter what.