The problem with a public thorough discussion in these cases is that once you understand the reasons why the idea is dangerous, you already know it, and don’t have the opportunity to choose whether to learn about it.
That’s definitely the root of the problem. In general, though, if we are talking about FAI, then there shouldn’t be a dangerous idea. If there is, then it means we are doing something wrong.
If you trust Eliezer’s honesty, then though he may make mistakes, you should not expect him to use this policy as a cover for banning posts as part of some hidden agenda.
I don’t think he’s got a hidden agenda; I’m concerned about his mistakes. Though I’m not astute enough to point them out, I think the LW community as a whole is.
In general, though, if we are talking about FAI, then there shouldn’t be a dangerous idea.
I have a response to this that I don’t actually want to say, because it could make the idea more dangerous to those who have heard about it but are currently safe due to not fully understanding it. I find that predicting that this sort of thing will happen makes me reluctant to discuss this issue, which may explain why of those who are talking about it, most seem to think the banning was wrong.
I don’t think he’s got a hidden agenda; I’m concerned about his mistakes.
Given that there has been one banned post. I think that his mistakes are much less of a problem than overwrought concern about his mistakes.
In general, though, if we are talking about FAI, then there shouldn’t be a dangerous idea. If there is, then it means we are doing something wrong.
Why do you believe that? FAI is full of potential for dangerous ideas. In its full development, it’s an idea with the power to rewrite 100 billion galaxies. That’s gotta be dangerous.
Let me try to rephrase: correct FAI theory shouldn’t have dangerous ideas. If we find that the current version does have dangerous ideas, then this suggests that we are on the wrong track. The “Friendly” in “Friendly AI” should mean friendly.
(FAIs shouldn’t be a threat, but a theory to create a FAI will obviously have at least potential to be used to create uFAIs. FAI theory will have plenty of dangerous ideas.)
I do think that TORTURE is the obvious option, and I think the main instinct behind SPECKS is scope insensitivity.
That isn’t very reassuring. I believe that if you had the choice of either letting a Paperclip maximizer burn the cosmic commons or torture 100 people, you’d choose to torture 100 people. Wouldn’t you?
...correctly programmed FAIs should not be a threat.
They are always a threat to some beings. For example beings who oppose CEV or other AI’s. Any FAI who would run a human version of CEV would be a potential existential risk to any alien civilisation. If you accept all this possible oppression in the name of what is subjectively friendliness, how can I be sure that you don’t favor torture for some humans that support CEV, in order to ensure it? After all you already allow for the possibility that many beings are being oppressed or possible killed.
They are always a threat to some beings. For example beings who oppose CEV or other AI’s. Any FAI who would run a human version of CEV would be a potential existential risk to any alien civilisation.
Another pointless flamewar. This part makes me curious though:
Roko’s original post was, in fact, wrong
There are two ways I can interpret your statement:
a) you know a lot more about decision theory than you’ve disclosed so far (here, in the workshop and elsewhere);
b) you don’t have that advanced knowledge, but won’t accept as “correct” any decision theory that leads to unpalatable consequences like Roko’s scenario.
From my point of view, and as I discussed in the post (this discussion got banned with the rest, although it’s not exactly on that topic), the problem here is the notion of “blackmail”. I don’t know how to formally distinguish that from any other kind of bargaining, and the way in which Roko’s post could be wrong that I remember required this distinction to be made (it could be wrong in other ways, but that I didn’t notice at the time and don’t care to revisit).
(The actual content edited out and posted as a top-level post.)
(I seem to have a talent for writing stuff, then deleting it, and then getting interesting replies. Okay. Let it stay as a little inference exercise for onlookers! And please nobody think that my comment contained interesting secret stuff; it was just a dumb question to Eliezer that I deleted myself, because I figured out on my own what his answer would be.)
Thanks for verbalizing the problems with “blackmail”. I’ve been thinking about these issues in the exact same way, but made no progress and never cared enough to write it up.
Perhaps the reason you are having trouble coming up with a satisfactory characterization of blackmail is that you want a definition with the consequence that it is rational to resist blackmail and therefore not rational to engage in blackmail.
Pleasant though this might be, I fear the universe is not so accomodating.
Elsewhere VN asks how to unpack the notion of a status-quo, and tries to characterize blackmail as a threat which forces the recipient to accept less utility than she would have received in the status quo. I don’t see any reason in game theory why such threats should be treated any differently than other threats. But it is easy enough to define the ‘status-quo’.
The status quo is the solution to a modified game—modified in such a way that the time between moves increases toward infinity and the current significance of those future moves (be they retaliations or compensations) is discounted toward zero. A player who lives in the present and doesn’t respond to delayed gratification or delayed punishment is pretty much immune to threats (and to promises).
On RW it’s called Headless Chicken Mode, when the community appears to go nuts for a time. It generally resolves itself once people have the yelling out of their system.
The trick is not to make any decisions based on the fact that things have gone into headless chicken mode. It’ll pass.
[The comment this is in reply to was innocently deleted by the poster, but not before I made this comment. However, I think I’m making a useful point here, so would prefer to keep this comment.]
This is certainly the case with regard to the kind of decision theoretic thing in Roko’s deleted post. I’m not sure if it is the case with all ideas that might come up while discussing FAI.
Let me try to rephrase: correct FAI theory shouldn’t have dangerous ideas. If we find that the current version does have dangerous ideas, then this suggests that we are on the wrong track. The “Friendly” in “Friendly AI” should mean friendly.
And so it was, but not an example for other times when it wasn’t. A rare occurrence. I’m pretty sure it didn’t lead to any errors though, in this simple case.
(I wonder why Eliezer pitched in the way he did, with only weak disambiguation between the content of Tetronian’s comment and commentary on correctness of Roko’s post.)
I got the impression that you responded to “FAI Theory” as our theorizing and Eliezer responded to it as the theory making its way to the eventual FAI.
That’s definitely the root of the problem. In general, though, if we are talking about FAI, then there shouldn’t be a dangerous idea. If there is, then it means we are doing something wrong.
I don’t think he’s got a hidden agenda; I’m concerned about his mistakes. Though I’m not astute enough to point them out, I think the LW community as a whole is.
I have a response to this that I don’t actually want to say, because it could make the idea more dangerous to those who have heard about it but are currently safe due to not fully understanding it. I find that predicting that this sort of thing will happen makes me reluctant to discuss this issue, which may explain why of those who are talking about it, most seem to think the banning was wrong.
Given that there has been one banned post. I think that his mistakes are much less of a problem than overwrought concern about his mistakes.
If you have a reply, please PM me. I’m interested in hearing it.
Are you interested in hearing it if it does give you a better understanding of the dangerous idea that you then realize is in fact dangerous?
It may not matter anymore, but yes, I would still like to hear it.
In this case, the same point has been made by others in this thread.
Why do you believe that? FAI is full of potential for dangerous ideas. In its full development, it’s an idea with the power to rewrite 100 billion galaxies. That’s gotta be dangerous.
Let me try to rephrase: correct FAI theory shouldn’t have dangerous ideas. If we find that the current version does have dangerous ideas, then this suggests that we are on the wrong track. The “Friendly” in “Friendly AI” should mean friendly.
Pretty much correct in this case. Roko’s original post was, in fact, wrong; correctly programmed FAIs should not be a threat.
(FAIs shouldn’t be a threat, but a theory to create a FAI will obviously have at least potential to be used to create uFAIs. FAI theory will have plenty of dangerous ideas.)
I want to highlight at this point how you think about similar scenarios:
That isn’t very reassuring. I believe that if you had the choice of either letting a Paperclip maximizer burn the cosmic commons or torture 100 people, you’d choose to torture 100 people. Wouldn’t you?
They are always a threat to some beings. For example beings who oppose CEV or other AI’s. Any FAI who would run a human version of CEV would be a potential existential risk to any alien civilisation. If you accept all this possible oppression in the name of what is subjectively friendliness, how can I be sure that you don’t favor torture for some humans that support CEV, in order to ensure it? After all you already allow for the possibility that many beings are being oppressed or possible killed.
This seems to be true and obviously so.
Narrowness. You can parry almost any statement like this, by posing a context outside its domain of applicability.
Another pointless flamewar. This part makes me curious though:
There are two ways I can interpret your statement:
a) you know a lot more about decision theory than you’ve disclosed so far (here, in the workshop and elsewhere);
b) you don’t have that advanced knowledge, but won’t accept as “correct” any decision theory that leads to unpalatable consequences like Roko’s scenario.
Which is it?
From my point of view, and as I discussed in the post (this discussion got banned with the rest, although it’s not exactly on that topic), the problem here is the notion of “blackmail”. I don’t know how to formally distinguish that from any other kind of bargaining, and the way in which Roko’s post could be wrong that I remember required this distinction to be made (it could be wrong in other ways, but that I didn’t notice at the time and don’t care to revisit).
(The actual content edited out and posted as a top-level post.)
(I seem to have a talent for writing stuff, then deleting it, and then getting interesting replies. Okay. Let it stay as a little inference exercise for onlookers! And please nobody think that my comment contained interesting secret stuff; it was just a dumb question to Eliezer that I deleted myself, because I figured out on my own what his answer would be.)
Thanks for verbalizing the problems with “blackmail”. I’ve been thinking about these issues in the exact same way, but made no progress and never cared enough to write it up.
Perhaps the reason you are having trouble coming up with a satisfactory characterization of blackmail is that you want a definition with the consequence that it is rational to resist blackmail and therefore not rational to engage in blackmail.
Pleasant though this might be, I fear the universe is not so accomodating.
Elsewhere VN asks how to unpack the notion of a status-quo, and tries to characterize blackmail as a threat which forces the recipient to accept less utility than she would have received in the status quo. I don’t see any reason in game theory why such threats should be treated any differently than other threats. But it is easy enough to define the ‘status-quo’.
The status quo is the solution to a modified game—modified in such a way that the time between moves increases toward infinity and the current significance of those future moves (be they retaliations or compensations) is discounted toward zero. A player who lives in the present and doesn’t respond to delayed gratification or delayed punishment is pretty much immune to threats (and to promises).
On RW it’s called Headless Chicken Mode, when the community appears to go nuts for a time. It generally resolves itself once people have the yelling out of their system.
The trick is not to make any decisions based on the fact that things have gone into headless chicken mode. It’ll pass.
[The comment this is in reply to was innocently deleted by the poster, but not before I made this comment. However, I think I’m making a useful point here, so would prefer to keep this comment.]
This is certainly the case with regard to the kind of decision theoretic thing in Roko’s deleted post. I’m not sure if it is the case with all ideas that might come up while discussing FAI.
Wrong and stupid.
FYI, this is an excellent example of contempt.
And so it was, but not an example for other times when it wasn’t. A rare occurrence. I’m pretty sure it didn’t lead to any errors though, in this simple case.
(I wonder why Eliezer pitched in the way he did, with only weak disambiguation between the content of Tetronian’s comment and commentary on correctness of Roko’s post.)
I got the impression that you responded to “FAI Theory” as our theorizing and Eliezer responded to it as the theory making its way to the eventual FAI.
Ok...but why?
Edit: If you don’t want to say why publicly, feel free to PM me.
here