Also, in Newcomb’s problem, the goal is to go away with as much money as possible. So it’s obvious what to optimize for.
What exactly is the goal with the Basilisk? To give as much money as possible, just to build an evil machine which would torture you unless you gave it as much money as possible, but luckily you did, so you kinda… “win”? You and your five friends are the selected ones who will get the enjoyment of watching the rest of humanity tortured forever? (Sounds like how some early Christians imagined Heaven. Only the few most virtuous ones will get saved, and watching the suffering of the damned in Hell will increase their joy of their own salvation.)
Completely ignoring the problem that just throwing a lot of money around doesn’t solve the problem of creating a safe recursively self-improving superhuman AI. (Quoting Sequences: “There’s a fellow currently on the AI list who goes around saying that AI will cost a quadrillion dollars—we can’t get AI without spending a quadrillion dollars, but we could get AI at any time by spending a quadrillion dollars.”) So these guys working on this evil machine… hungry, living in horrible conditions, never having a vacation or going on a date, never seeing a doctor, probably having mental breakdowns all the time; because they are writing the code that would torture them if they did any of that… is this the team we could trust with doing sane and good decisions, and getting all the math right? If no, then we are pretty much fucked regardless of whether we donated to the Basilisk or not, because soon we are all getting transformed to paperclips anyway; the only difference is that 99.9999999% of us will get tortured before that.
How about, you know, just not building the whole monster at the first place? Uhm… could the solution to this horrible problem really be so easy?
What exactly is the goal with the Basilisk? To give as much money as possible, just to build an evil machine which would torture you unless you gave it as much money as possible, but luckily you did, so you kinda… “win”? You and your five friends are the selected ones who will get the enjoyment of watching the rest of humanity tortured forever? [...] Uhm… could the solution to this horrible problem really be so easy?
No. All people who never heard of the Basilisk argument would also live in heaven. Even all people who heard of it in a way where it was clear that they wouldn’t take it seriously would live in heaven.
No. All people who never heard of the Basilisk argument would also live in heaven. Even all people who heard of it in a way where it was clear that they wouldn’t take it seriously would live in heaven.
That isn’t necessarily true. The kind of reasoning assumed in the Basilisk uFAI would also use the ‘innocents’ as hostages if it would help to extort compliance from the believers. It depends entirely on the (economic power weighted aggregate) insanity of the ‘suckers’ the uFAI is exploiting.
The basilisk get’s more compliance from the believers when he puts the innocents into heaven then when he puts them into hell.
Also the debate is not about an UFAI but a FAI that optimizes the utility function of general welfare with TDT.
This is also the point, where you might think about how Eliezer’s censorship had an effect. His censuring did lead you and Viliam_Bur to have an understanding of the issue where you think it’s about an UFAI.
The basilisk get’s more compliance from the believers when he puts the innocents into heaven then when he puts them into hell.
This is at best not clear. It depends on the specific nature of the insanity in the compliant. Note that brutally disincentivizing evangelism has… instrumental downsides.
the believers
Don’t be misled by the loose relationship with Pascal’s Wager. This isn’t about belief, it is about decisions (and counterfactual decisions).
Also the debate is not about an UFAI but
The use of the term uFAI is deliberate, and correct. We don’t need to define a torture-terrorist as Friendly just because of some sloppy utilitarian reasoning. Moreover, any actual risk from the scenario comes from AGI creators (or influencers) that make this assumption. That’s the only thing that can cause the torture to happen.
His censuring did lead you and Viliam_Bur to have an understanding of the issue where you think it’s about an UFAI.
You are overconfident in your mind reading skills. I was one of the few people who were familiar enough with the subject matter at the time when Roko was writing his (typically fascinating) posts that I categorised the agent as a plausible not-friendly AGI immediately, the scenario as an interesting twist on acausal extortion then went straight to thinking about the actual content of the post, which was about a new means of cooperation.
There is this problem with human psychology that threatening someone with torture doesn’t contribute to their better judgement.
If threatening someone with eternal torture would magically raise their intelligence over 9000 and give them ability to develop a correct theory of Friendliness and reliably make them build a Friendly AI in five years… then yes, under these assumptions, threatening people with eternal torture could be the morally correct thing to do.
But human psychology doesn’t work this way. If you start threatening people with torture, they are more likely to make mistakes in their reasoning. See: motivated reasoning, “ugh” fields, etc.
Therefore, the hypothetical AI threatening people with torture for… well, pretty much for not being perfectly epistemically and instrumentally rational… would decrease the probability of Friendly AI being built correctly. Therefore, I don’t consider this hypothetical AI to be Friendly.
These questions are equivalent in the same sense as “how about just not setting X equal to pi” and “how about just setting X equal to e” are equivalent. Assuming you can do the latter is a prediction; assuming you can do the former is an antiprediction.
To the contrary, “just building the [very specific sort of] whole monster” is what’s more equivalent to “just building a [very specific definition of] Friendly AI”, an a priori improbable task.
Worse for the basilisk: at least in the case of Friendly AI you might end up stuck with nothing better to do but throw a dart and hope for a bulls-eye. But in the case of the basilisk, the acausal trade is only rational if you expect a high likelihood of the trade being carried out. But if that likelihood is low then you’re just being nutty, which means it’s unlikely for the other side of the trade to be upheld in any case (acausally trying to influence Omega’s prediction of you may work if Omega is omniscient, but not so well if Omega is irrational). This lowers the likelihood still further… until the only remaining question is simply “what’s the fixed point of “x_{n+1} = x_n/2″?”
These questions are equivalent in the same sense as “how about just not setting X equal to pi” and “how about just setting X equal to e” are equivalent. Assuming you can do the latter is a prediction; assuming you can do the former is an antiprediction.
Consider my parallel changed to “How about, you know, just not building an Unfriendly AI? Uhm… could the solution to the safe AI problem really be so easy?”
There are many possible Unfriendly AI, and most of them don’t base their decision of torturing you on whether you gave them all your money.
Therefore, you can use your reason to try building a Friendly AI… and either succeed or fail, depending on the complexity of the problem and your ability to solve it.
But not depending on a blackmail.
This is the difference between “you should be very careful to avoid building any Unfriendly AI, which may be a task beyond your skills”, and “you should build this specific Unfriendly AI, because if you don’t, but someone else does, then it will torture you for an eternity”. In the former case, your intelligence is used to generate a good outcome, and yes, you may fail. In the latter case, your intelligence is used to fight against itself; you are are forcing yourself to work towards an outcome that you actually don’t want.
That’s not the same thing. Building a Friendly AI is insanely difficult. Building a Torture AI is insane and difficult.
Also, in Newcomb’s problem, the goal is to go away with as much money as possible. So it’s obvious what to optimize for.
What exactly is the goal with the Basilisk? To give as much money as possible, just to build an evil machine which would torture you unless you gave it as much money as possible, but luckily you did, so you kinda… “win”? You and your five friends are the selected ones who will get the enjoyment of watching the rest of humanity tortured forever? (Sounds like how some early Christians imagined Heaven. Only the few most virtuous ones will get saved, and watching the suffering of the damned in Hell will increase their joy of their own salvation.)
Completely ignoring the problem that just throwing a lot of money around doesn’t solve the problem of creating a safe recursively self-improving superhuman AI. (Quoting Sequences: “There’s a fellow currently on the AI list who goes around saying that AI will cost a quadrillion dollars—we can’t get AI without spending a quadrillion dollars, but we could get AI at any time by spending a quadrillion dollars.”) So these guys working on this evil machine… hungry, living in horrible conditions, never having a vacation or going on a date, never seeing a doctor, probably having mental breakdowns all the time; because they are writing the code that would torture them if they did any of that… is this the team we could trust with doing sane and good decisions, and getting all the math right? If no, then we are pretty much fucked regardless of whether we donated to the Basilisk or not, because soon we are all getting transformed to paperclips anyway; the only difference is that 99.9999999% of us will get tortured before that.
How about, you know, just not building the whole monster at the first place? Uhm… could the solution to this horrible problem really be so easy?
Yes.
No. All people who never heard of the Basilisk argument would also live in heaven. Even all people who heard of it in a way where it was clear that they wouldn’t take it seriously would live in heaven.
That isn’t necessarily true. The kind of reasoning assumed in the Basilisk uFAI would also use the ‘innocents’ as hostages if it would help to extort compliance from the believers. It depends entirely on the (economic power weighted aggregate) insanity of the ‘suckers’ the uFAI is exploiting.
The basilisk get’s more compliance from the believers when he puts the innocents into heaven then when he puts them into hell. Also the debate is not about an UFAI but a FAI that optimizes the utility function of general welfare with TDT.
This is also the point, where you might think about how Eliezer’s censorship had an effect. His censuring did lead you and Viliam_Bur to have an understanding of the issue where you think it’s about an UFAI.
This is at best not clear. It depends on the specific nature of the insanity in the compliant. Note that brutally disincentivizing evangelism has… instrumental downsides.
Don’t be misled by the loose relationship with Pascal’s Wager. This isn’t about belief, it is about decisions (and counterfactual decisions).
The use of the term uFAI is deliberate, and correct. We don’t need to define a torture-terrorist as Friendly just because of some sloppy utilitarian reasoning. Moreover, any actual risk from the scenario comes from AGI creators (or influencers) that make this assumption. That’s the only thing that can cause the torture to happen.
You are overconfident in your mind reading skills. I was one of the few people who were familiar enough with the subject matter at the time when Roko was writing his (typically fascinating) posts that I categorised the agent as a plausible not-friendly AGI immediately, the scenario as an interesting twist on acausal extortion then went straight to thinking about the actual content of the post, which was about a new means of cooperation.
Roko’s post explicitly mentioned trading with unfriendly AI’s.
yeah, the horror lies in the idea that it might be morally CORRECT for an FAI to engage in eternal torture of some people.
There is this problem with human psychology that threatening someone with torture doesn’t contribute to their better judgement.
If threatening someone with eternal torture would magically raise their intelligence over 9000 and give them ability to develop a correct theory of Friendliness and reliably make them build a Friendly AI in five years… then yes, under these assumptions, threatening people with eternal torture could be the morally correct thing to do.
But human psychology doesn’t work this way. If you start threatening people with torture, they are more likely to make mistakes in their reasoning. See: motivated reasoning, “ugh” fields, etc.
Therefore, the hypothetical AI threatening people with torture for… well, pretty much for not being perfectly epistemically and instrumentally rational… would decrease the probability of Friendly AI being built correctly. Therefore, I don’t consider this hypothetical AI to be Friendly.
[removed]
This question is equivalent to: “How about, you know, just building a Friendly AI? Uhm… could the solution to the safe AI problem really be so easy?”
These questions are equivalent in the same sense as “how about just not setting X equal to pi” and “how about just setting X equal to e” are equivalent. Assuming you can do the latter is a prediction; assuming you can do the former is an antiprediction.
To the contrary, “just building the [very specific sort of] whole monster” is what’s more equivalent to “just building a [very specific definition of] Friendly AI”, an a priori improbable task.
Worse for the basilisk: at least in the case of Friendly AI you might end up stuck with nothing better to do but throw a dart and hope for a bulls-eye. But in the case of the basilisk, the acausal trade is only rational if you expect a high likelihood of the trade being carried out. But if that likelihood is low then you’re just being nutty, which means it’s unlikely for the other side of the trade to be upheld in any case (acausally trying to influence Omega’s prediction of you may work if Omega is omniscient, but not so well if Omega is irrational). This lowers the likelihood still further… until the only remaining question is simply “what’s the fixed point of “x_{n+1} = x_n/2″?”
Consider my parallel changed to “How about, you know, just not building an Unfriendly AI? Uhm… could the solution to the safe AI problem really be so easy?”
There are many possible Unfriendly AI, and most of them don’t base their decision of torturing you on whether you gave them all your money.
Therefore, you can use your reason to try building a Friendly AI… and either succeed or fail, depending on the complexity of the problem and your ability to solve it.
But not depending on a blackmail.
This is the difference between “you should be very careful to avoid building any Unfriendly AI, which may be a task beyond your skills”, and “you should build this specific Unfriendly AI, because if you don’t, but someone else does, then it will torture you for an eternity”. In the former case, your intelligence is used to generate a good outcome, and yes, you may fail. In the latter case, your intelligence is used to fight against itself; you are are forcing yourself to work towards an outcome that you actually don’t want.
That’s not the same thing. Building a Friendly AI is insanely difficult. Building a Torture AI is insane and difficult.