I’m actually not sure if I understand your point. Either it is a round-about way of making it or I’m totally dense and the idea really is dangerous (or some third option).
It’s not that the idea is wrong and no one would believe it, it’s that the idea is wrong and when presented with with the explanation for why it’s wrong no one should believe it. In addition, it’s kind of important that people understand why it’s wrong. I’m sympathetic to people with different minds that might have adverse reactions to things I don’t but the solution to that is to warn them off, not censor the topics entirely.
This is a politically reinforced heuristic that does not work for this problem.
Transparency is very important regarding people and organisations in powerful and unique positions. The way they act and what they claim in public is weak evidence in support of their honesty. To claim that they have to censor certain information in the name of the greater public good, and to fortify the decision based on their public reputation, does bear no evidence about their true objectives. The only way to solve this issue is by means of transparency.
Surely transparency might have negative consequences, but it mustn’t and can outweigh the potential risks from just believing that certain people are telling the truth and do not engage in deception to follow through on their true objectives.
There is also nothing that Yudkowsky has ever achieved that would sufficiently prove his superior intellect that would in turn justify people to just believe him about some extraordinary claim.
When I say something is a misapplied politically reinforced heuristic, you only reinforce my point by making fully general political arguments that it is always right.
Censorship is not the most evil thing in the universe. The consequences of transparency are allowed to be worse than censorship. Deal with it.
When I say something is a misapplied politically reinforced heuristic, you only reinforce my point by making fully general political arguments that it is always right.
I already had Anna Salamon telling me something about politics. You sound as incomprehensible to me. Sorry, not meant as an attack.
Censorship is not the most evil thing in the universe. The consequences of transparency are allowed to be worse than censorship. Deal with it.
I stated several times in the past that I am completely in favor of censorship, I have no idea why you are telling me this.
Our rules and intuitions about free speech and censorship are based on the types of censorship we usually see in practice. Ordinarily, if someone is trying to censor a piece of information, then that information falls into one of two categories: either it’s information that would weaken them politically, by making others less likely to support them and more likely to support their opponents, or it’s information that would enable people to do something that they don’t want done.
People often try to censor information that makes people less likely to support them, and more likely to support their opponents. For example, many governments try to censor embarrassing facts (“the Purple Party takes bribes and kicks puppies!”), the fact that opposition exists (“the Pink Party will stop the puppy-kicking!”) and its strength (“you can join the Pink Party, there are 10^4 of us already!”), and organization of opposition (“the Pink Party rally is tomorrow!”). This is most obvious with political parties, but it happens anywhere people feel like there are “sides”—with religions (censorship of “blasphemy”) and with public policies (censoring climate change studies, reports from the Iraq and Afghan wars). Allowing censorship in this category is bad because it enables corruption, and leaves less-worthy groups in charge.
The second common instance of censorship is encouragement and instructions for doing things that certain people don’t want done. Examples include cryptography, how to break DRM, pornography, and bomb-making recipes. Banning these is bad if the capability is suppressed for a bad reason (cryptography enables dissent), if it’s entangled with other things (general-purpose chemistry applies to explosives), or if it requires infrastructure that can also be used for the first type of censorship (porn filters have been caught blocking politicians’ campaign sites).
These two cases cover 99.99% of the things we call “censorship”, and within these two categories, censorship is definitely bad, and usually worth opposing. It is normally safe to assume that if something is being censored, it is for one of these two reasons. There are gray areas—slander (when the speaker knows he’s lying and has malicious intent), and bomb-making recipes (when they’re advertised as such and not general-purpose chemistry), for example—but the law has the exceptions mapped out pretty accurately. (Slander gets you sued, bomb-making recipes get you surveilled.) This makes a solid foundation for the principle that censorship should be opposed.
However, that principle and the analysis supporting it apply only to censorship that falls within these two domains. When things fall outside these categories, we usually don’t call them censorship; for example, there is a widespread conspiracy among email and web site administrators to suppress ads for Viagra, but we don’t call that censorship, even though it meets every aspect of the definition except motive. If you happen to find a weird instance of censorship which doesn’t fall into either category, then you have to start over and derive an answer to whether censorship in that particular case is good or bad, from scratch, without resorting to generalities about censorship-in-general. Some of the arguments may still apply—for example, building a censorship-technology infrastructure is bad even if it’s only meant to be used on spam—but not all of them, and not with the same force.
If the usual arguments against censorship don’t apply, and we’re trying to figure out whether to censor it, the next two things to test are whether it’s true, and whether an informed reader would want to see it. If both of these conditions hold, then it should not be censored. However, if either condition fails to hold, then it’s okay to censor.
Either the forbidden post is false, in which case it does not deserve protection because it’s false, or it’s true, in which case it should be censored because no informed person should want to see it. In either case, people spreading it are doing a bad thing.
Either the forbidden post is false, in which case it does not deserve protection because it’s false,
Even if this is right the censorship extends to perhaps true conversations about why the post is false. Moreover, I don’t see what truth has to do with it. There are plenty of false claims made on this site that nonetheless should be public because understanding why they’re false and how someone might come to think that they are true are worthwhile endeavors.
The question here is rather straight forward: does the harm of the censorship outweigh the harm of letting people talk about the post. I can understand how you might initially think those who disagree with you are just responding to knee-jerk anti-censorship instincts that aren’t necessarily valid here. But from where I stand the arguments made by those who disagree with you do not fit this pattern. I think XiXi has been clear in the past about why the transparency concern does apply to SIAI. We’ve also seen arguments for why censorship in this particular case is a bad idea.
Either the forbidden post is false, in which case it does not deserve protection because it’s false, or it’s true, in which case it should be censored because no informed person should want to see it. In either case, people spreading it are doing a bad thing.
There are clearly more than two options here. There seem to be two points under contention:
It is/is not (1/2) reasonable to agree with the forbidden post.
It is/is not (3/4) desirable to know the contents of the forbidden post.
You seem to be restricting us to either 2+3 or 1+4. It seems that 1+3 is plausible (should we keep children from ever knowing about death because it’ll upset them?), and 2+4 seems like a good argument for restriction of knowledge (the idea is costly until you work through it, and the benefits gained from reaching the other side are lower than the costs).
But I personally suspect 2+3 is the best description, and that doesn’t explain why people trying to spread it are doing a bad thing. Should we delete posts on Pascal’s Wager because someone might believe it?
Either the forbidden post is false, in which case it does not deserve protection because it’s false, or it’s true, in which case it should be censored because no informed person should want to see it.
Excluded middle, of course: incorrect criterion. (Was this intended as a test?) It would not deserve protection if it were useless (like spam), not “if it were false.”
The reason I consider sufficient to keep it off LessWrong is that it actually hurt actual people. That’s pretty convincing to me. I wouldn’t expunge it from the Internet (though I might put a warning label on it), but from LW? Appropriate. Reposting it here? Rude.
Unfortunately, that’s also an argument as to why it needs serious thought applied to it, because if the results of decompartmentalised thinking can lead there, humans need to be able to handle them. As Vaniver pointed out, there are previous historical texts that have had similar effects. Rationalists need to be able to cope with such things, as they have learnt to cope with previous conceptual basilisks. So it’s legitimate LessWrong material at the same time as being inappropriate for here. Tricky one.
(To the ends of that “compartmentalisation” link, by the way, I’m interested in past examples of basilisks and other motifs of harmful sensation in idea form. Yes, I have the deleted Wikipedia article.)
Note that I personally found the idea itself silly at best.
The assertion that if a statement is not true, fails to alter political support, fails to provide instruction, and an informed reader wants to see that statement, it is therefore a bad thing to spread that statement and a OK thing to censor, is, um, far from uncontroversial.
To begin with, most fiction falls into this category. For that matter, so does most nonfiction, though at least in that case the authors generally don’t intend for it to be non-true.
The assertion that if a statement is not true, fails to alter political support, fails to provide instruction, and an informed reader wants to see that statement, it is therefore a bad thing to spread that statement and a OK thing to censor, is, um, far from uncontroversial.
No, you reversed a sign bit: it is okay to censor if an informed reader wouldn’t want to see it (and the rest of those conditions).
No, I don’t think so. You said “if either condition fails to hold, then it’s okay to censor.” If it isn’t true, and an informed reader wants to see it, then one of the two conditions failed to hold, and therefore it’s OK to censor.
Oops, you’re right—one more condition is required. The condition I gave is only sufficient to show that it fails to fall into a protected class, not that it falls in the class of things that should be censored; there are things which fall in neither class (which aren’t normally censored because that requires someone with a motive to censor it, which usually puts it into one of the protected classes). To make it worthy of censorship, there must additionally be a reason outside the list of excluded reasons to censor it.
I just have trouble understanding what you are saying. That might very well be my fault. I do not intent any hostile attack against you or the SIAI. I’m just curious, not worried at all. I do not demand anything. I’d like to learn more about you people, what you believe and how you arrived at your beliefs.
There is this particular case of the forbidden topic and I am throwing everything I got at it to see if the beliefs about it are consistent and hold water. That doesn’t mean that I am against censorship or that I believe it is wrong. I believe it is right but too unlikely (...). I believe that Yudkowsky and the SIAI are probably honest (although my gut feeling is to be very skeptic) but that there are good arguments for more transparency regarding the SIAI (if you believe it is as important as being portrayed). I believe that Yudkowsky is wrong about his risk estimation regarding the idea.
I just don’t understand your criticism of my past comments and that included telling me something about how I use politics (I don’t get it) and that I should accept that censorship sometimes is necessary (which I haven’t argued against).
There is this particular case of the forbidden topic and I am throwing everything I got at it to see if the beliefs about it are consistent and hold water.
There is this particular case of the forbidden topic and I am throwing everything I got at it to see if the beliefs about it are consistent and hold water.
The problem with that is that Eliezer and those who agree with him, including me, cannot speak freely about our reasoning on the issue, because we don’t want to spread the idea, so we don’t want to describe it and point to details about it as we describe our reasoning. If you imagine yourself in our position, believing the idea is dangerous, you could tell that you wouldn’t want to spread the idea in the process of explaining its danger either.
Under more normal circumstances, where the ideas we disagree about are not thought by anyone to be dangerous, we can have effective discussion by laying out our true reasons for our beliefs, and considering counter arguments that refer to the details of our arguments. Being cut off from our normal effective methods of discussion is stressful, at least for me.
I have been trying to persuade people who don’t know the details of the idea or don’t agree that it is dangerous that we do in fact have good reasons for believing it to be dangerous, or at least that this is likely enough that they should let it go. This is a slow process, as I think of ways to express my thoughts without revealing details of the dangerous idea, or explaining them to people who know but don’t understand those details. And this ends up involving talking to people who, because they don’t think the idea is dangerous and don’t take it seriously, express themselves faster and less carefully, and who have conflicting goals like learning or spreading the idea, or opposing censorship in general, or having judged for themselves the merits of censorship (from others just like them) in this case. This is also stressful.
I engage in this stressful topic, because I think it is important, both that people do not get hurt from learning about this idea, and that SIAI/Eliezer do not get dragged through mud for doing the right thing.
Sorry, but I am not here to help you get the full understanding you need to judge if the beliefs are consistent and hold water. As I have been saying, this is not a normal discussion. And seriously, you would be better of dropping it and finding something else to worry about. And if you think it is important, you can remember to track if SIAI/Eliezer/supporters like me engage in a pattern of making excuses to ban certain topics to protect some hidden agenda. But then please remember all the critical discussion that don’t get banned.
I have been trying to persuade people who don’t know the details of the idea or don’t agree that it is dangerous that we do in fact have good reasons for believing it to be dangerous, or at least that this is likely enough that they should let it go. This is a slow process, as I think of ways to express my thoughts without revealing details of the dangerous idea, or explaining them to people who know but don’t understand those details.
Note that this shouldn’t be possible other than through arguments from authority.
(I’ve just now formed a better intuitive picture of the reasons for danger of the idea, and saw some of the comments previously made unnecessarily revealing, where the additional detail didn’t actually serve the purpose of convincing people I communicated with, who lacked some of the prerequisites for being able to use that detail to understand the argument for danger, but would potentially gain (better) understanding of the idea. It does still sound silly to me, but maybe the lack of inferential stability of this conclusion should actually be felt this way—I expect that the idea will stop being dangerous in the following decades due to better understanding of decision theory.)
Does this theory of yours require that Eliezer Yudkowsky plus several other old-time Less Wrongians are holding the Idiot Ball and being really stupid about something that you can just see as obvious?
Now might be a good time to notice that you are confused.
Something to keep in mind when you reply to comments here is that you are the default leader of this community and its highest status member. This means comments that would be reasonably glib or slightly snarky from other posters can come off as threatening and condescending when made by you. They’re not really threatening but they can instill in their targets strong fight-or-flight responses. Perhaps this is because in the ancestral environment status challenges from group leaders were far more threatening to our ancestor’s livelihood than challenges from other group members. When you’re kicking out trolls it’s a sight to see, but when you’re rhetorically challenging honest interlocutors it’s probably counter-productive. I had to step away from the computer because I could tell that even if I was wrong the feelings this comment provoked weren’t going to let me admit it (and you weren’t even actually mean, just snobby).
As to your question, I don’t think my understanding of the idea requires anyone to be an idiot. In fact from what you’ve said I doubt we’re that far a part on the matter of how threatening the idea is. There may be implications I haven’t thought through that you have and there maybe general responses to implications I’ve thought of that you haven’t. I often have trouble telling how much intelligence I needed to get somewhere but I think I’ve applied a fair amount in this case. Where I think we probably diverge significantly is in our estimation of the cost of the censorship which I think is more than high enough to outweigh the risk of making Roko’s idea public. It is at least plausible that you are underestimating this cost due to biases resulting from you social position in this group and your organizational affiliation.
I’ll note that, as wedrifid suggested, your position also seems to assume that quite a few Less Wrongians are being really stupid and can’t see the obvious. Perhaps those who have expressed disagreement with your decision aren’t quite as old-time as those who have. And perhaps this is because we have not internalized important concepts or accessed important evidence required to see the danger in Roko’s idea. But it is also noteworthy that the people who have expressed disagreement have mostly been outside the Yudkowsky/SIAI cluster relative to those who have agreed with you. This suggests that they might be less susceptible to the biases that may be affecting your estimation of the cost of the censorship.
I am a bit confused as I’m not totally sure the explanations I’ve thought of or seen posted for your actions sufficiently explain them- but that’s just the kind of uncertainty one always expects in disagreements. Are you not confused? If I didn’t think there was a downside to the censorship I would let it go. But I think the downside is huge, in particular I think the censorship makes it much harder to get more people to take Friendliness seriously as a scholarly field by people beyond the SIAI circle. I’m not sure you’re humble enough to care about that (that isn’t meant as a character attack btw). It makes the field look like a joke and makes its leading scholar look ridiculous. I’m not sure you have the political talents to recognize that. It also slightly increases the chances of someone not recognizing this failure mode (the one in Roko’s post) when it counts. I think you might be so sure (or so focused on the possibility that) you’re going to be the one flipping the switch in that situation that you aren’t worried enough about that.
It seems to me that the natural effect of a group leader persistently arguing from his own authority is Evaporative Cooling of Group Beliefs. This is of course conducive to confirmation bias and corresponding epistemological skewing for the leader; things which seem undesirable for somebody in Eliezer’s position. I really wish that Eliezer was receptive to taking this consideration seriously.
It seems to me that the natural effect of a group leader persistently arguing from his own authority is Evaporative Cooling of Group Beliefs. This is of course conducive to confirmation bias and corresponding epistemological skewing for the leader; things which seem undesirable for somebody in Eliezer’s position. I really wish that Eliezer was receptive to taking this consideration seriously.
The thing is he usually does. That is one thing that has in the past set Eliezer apart from Robin and impressed me about Eliezer. Now it is almost as though he has embraced the evaporative cooling concept as an opportunity instead of a risk and gone and bought himself a blowtorch to force the issue!
Maybe, given the credibility he has accumulated on all these other topics, you should be willing to trust him on the one issue on which he is asserting this authority and on which it is clear that if he is right, it would be bad to discuss his reasoning.
Maybe, given the credibility he has accumulated on all these other topics, you should be willing to trust him on the one issue on which he is asserting this authority and on which it is clear that if he is right, it would be bad to discuss his reasoning.
The well known (and empirically verified) weakness in experts of the human variety is that they tend to be systematically overconfident when it comes to judgements that fall outside their area of exceptional performance—particularly when the topic is one just outside the fringes.
When it comes to blogging about theoretical issues of rationality Eliezer is undeniably brilliant. Yet his credibility specifically when it comes to responding to risks is rather less outstanding. In my observation he reacts emotionally and starts making rookie mistakes of rational thought and action. To the point when I’ve very nearly responded ‘Go read the sequences!’ before remembering that he was the flipping author and so should already know better.
Also important is the fact that elements of the decision are about people, not game theory. Eliezer hopefully doesn’t claim to be an expert when it comes to predicting or eliciting optimal reactions in others.
Was it not clear that I do not assign particular credence to Eliezer when it comes to judging risks? I thought I expressed that with considerable emphasis.
I’m aware that you disagree with my conclusions—and perhaps even my premises—but I can assure you that I’m speaking directly to the topic.
Maybe, given the credibility he has accumulated on all these other topics, you should be willing to trust him on the one issue on which he is asserting this authority and on which it is clear that if he is right, it would be bad to discuss his reasoning.
I do not consider this strong evidence as there are many highly intelligent and productive people who hold crazy beliefs:
Francisco J. Ayala who “…has been called the “Renaissance Man of Evolutionary Biology” is a geneticist ordained as a Dominican priest. “His “discoveries have opened up new approaches to the prevention and treatment of diseases that affect hundreds of millions of individuals worldwide…”
Francis Collins (geneticist, Human Genome Project) noted for his landmark discoveries of disease genes and his leadership of the Human Genome Project (HGP) and described by the Endocrine Society as “one of the most accomplished scientists of our time” is a evangelical Christian.
Peter Duesberg (a professor of molecular and cell biology at the University of California, Berkeley) claimed that AIDS is not caused by HIV, which made him so unpopular that his colleagues and others have — until recently — been ignoring his potentially breakthrough work on the causes of cancer.
Georges Lemaître (a Belgian Roman Catholic priest) proposed what became known as the Big Bang theory of the origin of the Universe.
Kurt Gödel (logician, mathematician and philosopher) who suffered from paranoia and believed in ghosts. “Gödel, by contrast, had a tendency toward paranoia. He believed in ghosts; he had a morbid dread of being poisoned by refrigerator gases; he refused to go out when certain distinguished mathematicians were in town, apparently out of concern that they might try to kill him.”
Mark Chu-Carroll (PhD Computer Scientist, works for Google as a Software Engineer) “If you’re religious like me, you might believe that there is some deity that created the Universe.” He is running one of my favorite blogs, Good Math, Bad Math, and writes a lot on debunking creationism and other crackpottery.
Nassim Taleb (the author of the 2007 book (completed 2010) The Black Swan) does believe: Can’t track reality with science and equations. Religion is not about belief. We were wiser before the Enlightenment, because we knew how to take knowledge from incomplete information, and now we live in a world of epistemic arrogance. Religious people have a way of dealing with ignorance, by saying “God knows”.
Kevin Kelly (editor) is a devout Christian. Writes pro science and technology essays.
I could continue this list with people like Ted Kaczynski or Roger Penrose. I just wanted show that intelligence and rational conduct do not rule out the possibility of being wrong about some belief.
Taleb quote doesn’t qualify. (I won’t comment on others.)
I should have made more clearly that it is not my intention to indicate that I believe that those people, or crazy ideas in general, are wrong. But there are a lot of smart people out there who’ll advocate opposing ideas. Using their reputation of being highly intelligent to follow through on their ideas is in my opinion not a very good idea in itself. I could just believe Freeman Dyson that existing simulation models of climate contain too much error to reliably predict future trends. I could believe Peter Duesberg that HIV does not cause aids, after all he is a brilliant molecular biologist. But I just do not think that any amount of reputation is enough evidence to believe extraordinary claims uttered by such people. And in the case of Yudkowsky, there doesn’t even exist much reputation and no great achievements at all that would justify some strong belief in his infallibility. What there exists in Yudkowsky’s case seems to be strong emotional commitment. I just can’t tell if he is honest. If he really believes that he’s working on a policy for some future superhuman intelligence that will rule the universe, then I’m going to be very careful. Not because it is wrong, but because such beliefs imply huge payoffs. Not that I believe he is the disguised Dr. Evil, but can we be sure enough to just trust him with it? Censorship of certain ideas does bear more evidence against him as it does in favor of his honesty.
How extensively have you searched for experts who made correct predictions outside their fields of expertise? What would you expect to see if you just searched for experts making predictions outside their field of expertise and then determined if that prediction were correct? What if you limited your search to experts who had expressed the attitude Eliezer expressed in Outside the Laboratory?
I just wanted show that intelligence and rational conduct do not rule out the possibility of being wrong about some belief.
“Rule out”? Seriously? What kind of evidence is it?
You extracted the “rule out” phrase from the sentence:
I just wanted show that intelligence and rational conduct do not rule out the possibility of being wrong about some belief.
From within the common phrase ‘do not rule out the possibility’ no less!
None of this affects my point that ruling out the possibility is the wrong, (in fact impossible), standard.
You then make a reference to ‘0 and 1s not probabilities’ with exaggerated incredulity.
Not exaggerated. XiXiDu’s post did seem to be saying: here are these examples of experts being wrong so it is possible that an expert is wrong in this case, without
saying anything useful about how probable it is for this particular expert to be wrong on this particular issue.
To put it mildly this struck me as logically rude and in general poor form.
You have made an argument accusing me of logical rudeness that, quite frankly, does not stand up to scrutiny.
Better evidence than I’ve ever seen in support of the censored idea. I have these well-founded principles, free speech and transparency, and weigh them against the evidence I have in favor of censoring the idea. That evidence is merely 1.) Yudkowsky’s past achievements, 2.) his output and 3.) intelligence. That intelligent people have been and are wrong about certain ideas while still being productive and right about many other ideas is evidence to weaken #3. That people lie and deceive to get what they want is evidence against #1 and #2 and in favor of transparency and free speech, which are both already more likely to have a positive impact than the forbidden topic is to have a negative impact.
And what are you trying to tell me with this link? I haven’t seen anyone stating numeric probability estimations regarding the forbidden topic. And I won’t state one either, I’ll just say that it is subjectively improbable enough to ignore it because there are possible too many very-very-low-probability events to take into account (for every being that will harm me if I don’t do X there is another being that will harm me if I do X, which cancel out each other). But if you’d like to pull some number out of thin air, go ahead. I won’t because I don’t have enough data to even calculate the probability of AI going FOOM versus a slow development.
You have failed to address my criticisms of you points, that you are seeking out only examples that support your desired conclusion, and that you are ignoring details that would allow you to construct a narrower, more relevant reference class for your outside view argument.
And what are you trying to tell me with this link?
I was telling you the “ruling out the possibility” is the wrong, (in fact impossible), standard.
You have failed to address my criticisms of you points, that you are seeking out only examples that support your desired conclusion.
Only now I understand your criticism. I do not seek out examples to support my conclusion but to weaken your argument that one should trust Yudkowsky because of his previous output. I’m aware that Yudkowsky can very well be right about the idea but do in fact believe that the risk is worth taking. Have I done extensive research on how often people in similar situations have been wrong? Nope. No excuses here, but do you think there are comparable cases of predictions that proved to be reliable? And how much research have you done in this case and about the idea in general?
I was telling you the “ruling out the possibility” is the wrong, (in fact impossible), standard.
I don’t, I actually stated a few times that I do not think that the idea is wrong.
Seeking out just examples that weaken my argument, when I never predicted that no such examples would exist, is the problem I am talking about.
My reason to weaken your argument is not that I want to be right but that I want feedback about my doubts. I said that 1.) people can be wrong, regardless of their previous reputation, 2.) that people can lie about their objectives and deceive by how they act in public (especially when the stakes are high), 3.) that Yudkowsky’s previous output and achievements are not remarkable enough to trust him about some extraordinary claim. You haven’t responded on why you tell people to believe Yudkowsky, in this case, regardless of my objections.
What made you think that supporting your conclusion and weakening my argument are different things?
I’m sorry if I made it appear as if I hold some particular belief. My epistemic state simply doesn’t allow me to arrive at your conclusion. To highlight this I argued in favor of what it would mean to not accept your argument, namely to stand to previously well-established concepts like free speech and transparency. Yes, you could say that there is no difference here, except that I do not care about who is right but what is the right thing to do.
people can be wrong, regardless of their previous reputation
Still, it’s incorrect to argue from existence of examples. You have to argue from likelihood. You’d expect more correctness from a person with reputation for being right than from a person with reputation for being wrong.
People can also go crazy, regardless of their previous reputation, but it’s improbable, and not an adequate argument for their craziness.
And you need to know what fact you are trying to convince people about, not just search for soldier-arguments pointing in the preferred direction. If you believe that the fact is that a person is crazy, you too have to recognize that “people can be crazy” is inadequate argument for this fact you wish to communicate, and that you shouldn’t name this argument in good faith.
(Craziness is introduced as a less-likely condition than wrongness to stress the structure of my argument, not to suggest that wrongness is as unlikely.)
I said that 1.) people can be wrong, regardless of their previous reputation, 2.) that people can lie about their objectives and deceive by how they act in public (especially when the stakes are high), 3.) that Yudkowsky’s previous output and achievements are not remarkable enough to trust him about some extraordinary claim.
I notice that Yudkowsky wasn’t always self-professed human-friendly. Consider this:
I must warn my reader that my first allegiance is to the Singularity, not humanity. I don’t know what the Singularity will do with us. I don’t know whether Singularities upgrade mortal races, or disassemble us for spare atoms. While possible, I will balance the interests of mortality and Singularity. But if it comes down to Us or Them, I’m with Them. You have been warned.
He’s changed his mind since. That makes it far, far less scary.
He has changed his mind about one technical point in meta-ethics. He now realizes that super-human intelligence does not automatically lead to super-human morality. He is now (IMHO) less wrong. But he retains a host of other (mis)conceptions about meta-ethics which make his intentions abhorrent to people with different (mis)conceptions. And he retains the arrogance that would make him dangerous to those he disagrees with, if he were powerful.
″… far, far less scary”? You are engaging in wishful thinking no less foolish than that for which Eliezer has now repented.
He is now (IMHO) less wrong. But he retains a host of other (mis)conceptions about meta-ethics which make his intentions abhorrent to people with different (mis)conceptions.
I’m not at all sure that I agree with Eliezer about most meta-ethics, and definitely disagree on some fairly important issues. But, that doesn’t make his views necessarily abhorrent. If Eliezer triggers a positive Singularity (positive in the sense that it reflects what he wants out of a Singularity, complete with CEV), I suspect that that will be a universe which I won’t mind living in. People can disagree about very basic issues and still not hate each others’ intentions. They can even disagree about long-term goals and not hate it if the other person’s goals are implemented.
If Eliezer triggers a positive Singularity (positive in the sense that it reflects what he wants out of a Singularity, complete with CEV), I suspect that that will be a universe which I won’t mind living in.
Have you ever have one of those arguments with your SO in which:
It is conceded that your intentions were good.
It is conceded that the results seem good.
The SO is still pissed because of the lack of consultation and/or presence of extrapolation?
I usually escape those confrontations by promising to consult and/or not extrapolate the next time. In your scenario, Eliezer won’t have that option.
When people point out that Eliezer’s math is broken because his undiscounted future utilities leads to unbounded utility, his response is something like “Find better math—discounted utility is morally wrong”.
When Eliezer suggests that there is no path to a positive singularity which allows for prior consultation with the bulk of mankind, my response is something like “Look harder. Find a path that allows people to feel that they have given their informed consent to both the project and the timetable—anything else is morally wrong.”
ETA: In fact, I would like to see it as a constraint on the meaning of the word “Friendly” that it must not only provide friendly consequences, but also, it must be brought into existence in a friendly way. I suspect that this is one of those problems in which the added constraint actually makes the solution easier to find.
Could you link to where Eliezer says that future utilities should not be discounted? I find that surprising, since uncertainty causes an effect roughly equivalent to discounting.
I would also like to point out that achieving public consensus about whether to launch an AI would take months or years, and that during that time, not only is there a high risk of unfriendly AIs, it is also guaranteed that millions of people will die. Making people feel like they were involved in the decision is emphatically not worth the cost
Could you link to where Eliezer says that future utilities should not be discounted?
He makes the case in this posting. It is a pretty good posting, by the way, in which he also points out some kinds of discounting which he believes are justified. This posting does not purport to be a knock-down argument against discounting future utility—it merely states Eliezer’s reasons for remaining unconvinced that you should discount (and hence for remaining in disagreement with most economic thinkers).
ETA: One economic thinker who disagrees with Eliezer is Robin Hanson. His response to Eliezer’s posting is also well worth reading.
Examples of Eliezer conducting utilitarian reasoning about the future without discounting are legion.
I find that surprising, since uncertainty causes an effect roughly equivalent to discounting.
Tim Tyler makes the same assertion about the effects of uncertainty. He backs the assertion with metaphor, but I have yet to see a worked example of the math. Can you provide one?
Of course, one obvious related phenomenon—it is even mentioned with respect in Eliezer’s posting—is that the value of a promise must be discounted with time due to the increasing risk of non-performance: my promise to scratch your back tomorrow is more valuable to you than my promise to scratch next week—simply because there is a risk that you or I will die in the interim, rendering the promise worthless. But I don’t see how other forms of increased uncertainty about the future should have the same (exponential decay) response curve.
achieving public consensus about whether to launch an AI would take months or years,
I find that surprising, since uncertainty causes an effect roughly equivalent to discounting.
Tim Tyler makes the same assertion about the effects of uncertainty. He backs the assertion with metaphor, but I have yet to see a worked example of the math. Can you provide one?
Most tree-pruning heuristics naturally cause an effect like temporal discounting. Resource limits mean that you can’t calculate the whole future tree—so you have to prune. Pruning normally means applying some kind of evaluation function early—to decide which branches to prune. The more you evaluate early, the more you are effectively valuing the near-present.
That is not maths—but hopefully it has a bit more detail than previously.
It doesn’t really address the question. In the A* algorithm the heuristic estimates of the objective function are supposed to be upper bounds on utility, not lower bounds. Furthermore, they are supposed to actually estimate the result of the complete computation—not to represent a partial computation exactly.
Furthermore, they are supposed to actually estimate the result of the complete computation—not to represent a partial computation exactly.
Reality check: a tree of possible futures is pruned at points before the future is completely calculated. Of course it would be nice to apply an evaluation function which represents the results of considering all possible future branches from that point on. However, getting one of those that produces results in a reasonable time would be a major miracle.
If you look at things like chess algorithms, they do some things to get a more accurate utility valuation when pruning—such as check for quiescence. However, they basically just employ a standard evaluation at that point—or sometimes a faster, cheaper approximation. If is sufficiently bad, the tree gets pruned.
However, getting one of those would be a major miracle.
We are living in the same reality. But the heuristic evaluation function still needs to be an estimate of the complete computation, rather than being something else entirely. If you want to estimate your own accumulation of pleasure over a lifetime, you cannot get an estimate of that by simply calculating the accumulation of pleasure over a shorter period—otherwise no one would undertake the pain of schooling motivated by the anticipated pleasure of high future income.
The question which divides us is whether an extra 10 utils now is better or worse than an additional 11 utils 20 years from now. You claim that it is worse. Period. I claim that it may well be better, depending on the discount rate.
I’m not sure I understand the question. What does it mean for a util to be ‘timeless’?
ETA: The question of the interaction of utility and time is a confusing one. In “Against Discount Rates”, Eliezer writes:
The idea that it is literally, fundamentally 5% more important that a poverty-stricken family have clean water in 2008, than that a similar family have clean water in 2009, seems like pure discrimination to me—just as much as if you were to discriminate between blacks and whites.
I think that Eliezer has expressed the issue in almost, but not quite, the right way. The right question is whether a decision maker in 2007 should be 5% more interested in doing something about the 2008 issue than about the 2009 issue. I believe that she should be. If only because she expects that she will have an entire year in the future to worry about the 2009 family without the need to even consider 2008 again. 2008′s water will be already under the bridge.
I’m sure someone else can explain this better than me, but: As I understand it, a util understood timelessly (rather than like money, which there are valid reasons to discount because it can be invested, lost, revalued, etc. over time) builds into how it’s counted all preferences, including preferences that interact with time. If you get 10 utils, you get 10 utils, full stop. These aren’t delivered to your door in a plain brown wrapper such that you can put them in an interest-bearing account. They’re improvements in the four-dimensional state of the entire universe over all time, that you value at 10 utils. If you get 11 utils, you get 11 utils, and it doesn’t really matter when you get them. Sure, if you get them 20 years from now, then they don’t cover specific events over the next 20 years that could stand improvement. But it’s still worth eleven utils, not ten. If you value things that happen in the next 20 years more highly than things that happen later, then utils according to your utility function will reflect that, that’s all.
That (timeless utils) is a perfectly sensible convention about what utility ought to mean. But, having adopted that convention, we are left with (at least) two questions:
Do I (in 2011) derive a few percent more utility from an African family having clean water in 2012 than I do from an equivalent family having clean water in 2013?
If I do derive more utility from the first alternative, am I making a moral error in having a utility function that acts that way?
I would answer yes to the first question. As I understand it, Eliezer would answer yes to the second question and would answer no to the first, were he in my shoes. I would claim that Eliezer is making a moral error in both judgments.
Do I (in 2011) derive a few percent more utility from an African family having clean water in 2012 than I do from an equivalent family having clean water in 2013?
Do you (in the years 2011, 2012, 2013, 2014) derive different relative utilities for these conditions? If so, it seems you have a problem.
I’m sorry. I don’t know what is meant by utility derived in 2014 from an event in 2012. I understand that the whole point of my assigning utilities in 2014 is to guide myself in making decisions in 2014. But no decision I make in 2014 can have an effect on events in 2012. So, from a decision-theoretic viewpoint, it doesn’t matter how I evaluate the utilities of past events. They are additive constants (same in all decision branches) in any computation of utility, and hence are irrelevant.
Or did you mean to ask about different relative utilities in the years before 2012? Yes, I understand that if I don’t use exponential discounting, then I risk inconsistencies.
The right question is whether a decision maker in 2007 should be 5% more interested in doing something about the 2008 issue than about the 2009 issue.
And that is a fact about 2007 decision maker, not 2008 family’s value as compared to 2009 family.
If, in 2007, you present me with a choice of clean water for a family for all of and only 2008 vs 2009, and you further assure me that these families will otherwise survive in hardship, and that their suffering in one year won’t materially affect their next year, and that I won’t have this opportunity again come this time next year, and that flow-on or snowball effects which benefit from an early start are not a factor here—then I would be indifferent to the choice.
If I would not be; if there is something intrinsic about earlier times that makes them more valuable, and not just a heuristic of preferring them for snowballing or flow-on reasons, then that is what Eliezer is saying seems wrong.
The right question is whether a decision maker in 2007 should be 5% more interested in doing something about the 2008 issue than about the 2009 issue. I believe that she should be. If only because she expects that she will have an entire year in the future to worry about the 2009 family without the need to even consider 2008 again. 2008′s water will be already under the bridge.
I would classify that as instrumental discounting. I don’t think anyone would argue with that—except maybe a superintelligence who has already exhausted the whole game tree—and for whom an extra year buys nothing.
FWIW, I genuinely don’t understand your perspective. The extent to which you discount the future depends on your chances of enjoying it—but also on factors like your ability to predict it—and your ability to influence it—the latter are functions of your abilities, of what you are trying to predict and of the current circumstances.
You really, really do not normally want to put those sorts of things into an agent’s utility function. You really, really do want to calculate them dynamically, depending on the agent’s current circumstances, prediction ability levels, actuator power levels, previous experience, etc.
Attempts to put that sort of thing into the utility function would normally tend to produce an inflexible agent, who has more difficulties in adapting and improving. Trying to incorporate all the dynamic learning needed to deal with the issue into the utility function might be possible in principle—but that represents a really bad idea.
Hopefully you can see my reasoning on this issue. I can’t see your reasoning, though. I can barely even imagine what it might possibly be.
Maybe you are thinking that all events have roughly the same level of unpredictability in the future, and there is roughly the same level of difficulty in influencing them, so the whole issue can be dealt with by one (or a small number of) temporal discounting “fudge factors”—and that evoution built us that way because it was too stupid to do any better.
You apparently denied that resource limitation results in temporal discounting. Maybe that is the problem (if so, see my other reply here). However, now you seem to have acknowledged that an extra year of time to worry in helps with developing plans. What I can see doesn’t seem to make very much sense.
You really, really do not normally want to put those sorts of things into an agent’s utility function.
I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental. So every time I offer a heuristic analogy explaining the motivation for fundamental discounting, you interpret it as a flawed argument for using discounting as a heuristic for instrumental reasons.
Since it appears that this will go on forever, and I don’t discount the future enough to make the sum of this projected infinite stream of disutility seem small, I really ought to give up. But somehow, my residual uncertainty about the future makes me think that you may eventually take Cromwell’s advice.
You really, really do not normally want to put those sorts of things into an agent’s utility function.
I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental.
To clarify: I do not think the only justification for discounting is instrumental. My position is more like: agents can have whatever utility functions they like (including ones with temporal discounting) without having to justify them to anyone.
However, I do think there are some problems associated with temporal discounting. Temporal discounting sacrifices the future for the sake of the present. Sometimes the future can look after itself—but sacrificing the future is also something which can be taken too far.
Axelrod suggested that when the shadow of the future grows too short, more defections happen. If people don’t sufficiently value the future, reciprocal altruism breaks down. Things get especially bad when politicians fail to value the future. We should strive to arrange things so that the future doesn’t get discounted too much.
Instrumental temporal discounting doesn’t belong in ultimate utility functions. So, we should figure out what temporal discounting is instrumental and exclude it.
If we are building a potentially-immortal machine intelligence with a low chance of dying and which doesn’t age, those are more causes of temporal discounting which could be discarded as well.
What does that leave? Not very much, IMO. The machine will still have some finite chance of being hit by a large celestial body for a while. It might die—but its chances of dying vary over time; its degree of temporal discounting should vary in response—once again, you don’t wire this in, you let the agent figure it out dynamically.
But the heuristic evaluation function still needs to be an estimate of the complete computation, rather than being something else entirely. If you want to estimate your own accumulation of pleasure over a lifetime, you cannot get an estimate of that by simply calculating the accumulation of pleasure over a shorter period—otherwise no one would undertake the pain of schooling motivated by the anticipated pleasure of high future income.
The point is that resource limitation makes these estimates bad estimates—and you can’t do better by replacing them with better estimates because of … resource limitation!
To see how resource limitation leads to temporal discounting, consider computer chess. Powerful computers play reasonable games—but heavily resource limited ones fall for sacrifice plays, and fail to make successful sacrifice gambits. They often behave as though they are valuing short-term gain over long term results.
A peek under the hood quickly reveals why. They only bother looking at a tiny section of the game tree near to the current position! More powerful programs can afford to exhaustively search that space—and then move on to positions further out. Also the limited programs employ “cheap” evaluation functions that fail to fully compensate for their short-term foresight—since they must be able to be executed rapidly. The result is short-sighted chess programs.
That resource limitation leads to temporal discounting is a fairly simple and general principle which applies to all kinds of agents.
To see how resource limitation leads to temporal discounting, consider computer chess.
Why do you keep trying to argue against discounting using an example where discounting is inappropriate by definition? The objective in chess is to win. It doesn’t matter whether you win in 5 moves or 50 moves. There is no discounting. Looking at this example tells us nothing about whether we should discount future increments of utility in creating a utility function.
Instead, you need to look at questions like this: An agent plays go in a coffee shop. He has the choice of playing slowly, in which case the games each take an hour and he wins 70% of them. Or, he can play quickly, in which case the games each take 20 minutes, but he only wins 60% of them. As soon as one game finishes, another begins. The agent plans to keep playing go forever. He gains 1 util each time he wins and loses 1 util each time he loses.
The main decision he faces is whether he maximizes utility by playing slowly or quickly. Of course, he has infinite expected utility however he plays. You can redefine the objective to be maximizing utility flow per hour and still get a ‘rational’ solution. But this trick isn’t enough for the following extended problem:
The local professional offers go lessons. Lessons require a week of time away from the coffee-shop and a 50 util payment. But each week of lessons turns 1% of your losses into victories. Now the question is: Is it worth it to take lessons? How many weeks of lessons are optimal? The difficulty here is that we need to compare the values of a one-shot (50 utils plus a week not playing go) with the value of an eternal continuous flow (the extra fraction of games per hour which are victories rather than losses). But that is an infinite utility payoff from the lessons, and only a finite cost, right? Obviously, the right decision is to take a week of lessons. And then another week after that. And so on. Forever.
Discounting of future utility flows is the standard and obvious way of avoiding this kind of problem and paradox. But now let us see whether we can alter this example to capture your ‘instrumental discounting due to an uncertain future’:
First, the obvious one. Our hero expects to die someday, but doesn’t know when. He estimates a 5% chance of death every year. If he is lucky, he could live for another century. Or he could keel over tomorrow. And when he dies, the flow of utility from playing go ceases. It is very well known that this kind of uncertainty about the future is mathematically equivalent to discounted utility in a certain future. But you seemed to be suggesting something more like the following:
Our hero is no longer certain what his winning percentage will be in the future. He knows that he experiences microstrokes roughly every 6 months, and that each incident takes 5% of his wins and changes them to losses. On the other hand, he also knows that roughly every year he experiences a conceptual breakthrough. And that each such breakthrough takes 10% of his losses and turns them into victories.
Does this kind of uncertainty about the future justify discounting on ‘instrumental grounds’? My intuition says ’No, not in this case, but there are similar cases in which discounting would work.” I haven’t actually done the math, though, so I remain open to instruction.
Why do you keep trying to argue against discounting using an example where discounting is inappropriate by definition? The objective in chess is to win. It doesn’t matter whether you win in 5 moves or 50 moves. There is no discounting. Looking at this example tells us nothing about whether we should discount future increments of utility in creating a utility function.
Temporal discounting is about valuing something happening today more than the same thing happening tomorrow.
Chess computers do, in fact discount. That is why they do prefer to mate you in twenty moves rather than a hundred.
The values of a chess computer do not just tell it to win. In fact, they are complex—e.g. Deep Blue had an evaluation function that was split into 8,000 parts.
Operation consists of maximising the utility function, after foresight and tree pruning. Events that take place in branches after tree pruning has truncated them typically don’t get valued at all—since they are not forseen. Resource-limited chess computers can find themselves preferring to promote a pawn sooner rather than later. They do so since they fail to see the benefit of sequences leading to promotion later.
So: we apparently agree that resource limitation leads to indifference towards the future (due to not bothering to predict it) - but I classify this as a kind of temporal discounting (since rewards in the future get ignored), wheras you apparently don’t.
Hmm. It seems as though this has turned out to be a rather esoteric technical question about exactly which set of phenomena the term “temporal discounting” can be used to refer to.
Earlier we were talking about whether agents focussed their attention on tomorrow—rather than next year. Putting aside the issue of whether that is classified as being “temporal discounting”—or not—I think the extent to which agents focus on the near-future is partly a consequence of resource limitation. Give the agents greater abilities and more resources and they become more future-oriented.
we apparently agree that resource limitation leads to indifference towards the future (due to not bothering to predict it)
No, I have not agreed to that. I disagree with almost every part of it.
In particular, I think that the question of whether (and how much) one cares about the future is completely prior to questions about deciding how to act so as to maximize the things one cares about. In fact, I thought you were emphatically making exactly this point on another branch.
But that is fundamental ‘indifference’ (which I thought we had agreed cannot flow from instrumental considerations). I suppose you must be talking about some kind of instrumental or ‘derived’ indifference. But I still disagree. One does not derive indifference from not bothering to predict—one instead derives not bothering to predict from being indifferent.
Furthermore, I don’t respond to expected computronium shortages by truncating my computations. Instead, I switch to an algorithm which produces less accurate computations at lower computronium costs.
but I classify this as a kind of temporal discounting (since rewards in the future get ignored), wheras you apparently don’t.
And finally, regarding classification, you seem to suggest that you view truncation of the future as just one form of discounting, whereas I choose not to. And that this makes our disagreement a quibble over semantics. To which I can only reply: Please go away Tim.
Furthermore, I don’t respond to expected computronium shortages by truncating my computations. Instead, I switch to an algorithm which produces less accurate computations at lower computronium costs.
I think you would reduce how far you look forward if you were interested in using your resources intelligently and efficiently.
If you only have a million cycles per second, you can’t realistically go 150 ply deep into your go game—no matter how much you care about the results after 150 moves. You compromise—limiting both depth and breadth. The reduction in depth inevitably means that you don’t look so far into the future.
A lot of our communication difficulty arises from using different models to guide our intuitions. You keep imagining game-tree evaluation in a game with perfect information (like chess or go). Yes, I understand your point that in this kind of problem, resource shortages are the only cause of uncertainty—that given infinite resources, there is no uncertainty.
I keep imagining problems in which probability is built in, like the coffee-shop-go-player which I sketched recently. In the basic problem, there is no difficulty in computing expected utilities deeper into the future—you solve analytically and then plug in whatever value for t that you want. Even in the more difficult case (with the microstrokes) you can probably come up with an analytic solution. My models just don’t have the property that uncertainty about the future arises from difficulty of computation.
Right. The real world surely contains problems of both sorts. If you have a problem which is dominated by chaos based on quantum events then more resources won’t help. Whereas with many other types of problems more resources do help.
I recognise the existence of problems where more resources don’t help—I figure you probably recognise that there are problems where more resources do help—e.g. the ones we want intelligent machines to help us with.
The real world surely contains problems of both sorts.
Perhaps the real world does. But decision theory doesn’t. The conventional assumption is that a rational agent is logically omniscient. And generalizing decision theory by relaxing that assumption looks like it will be a very difficult problem.
The most charitable interpretation I can make of your argument here is that human agents, being resource limited, imagine that they discount the future. That discounting is a heuristic introduced by evolution to compensate for those resource limitations. I also charitably assume that you are under the misapprehension that if I only understood the argument, I would agree with it. Because if you really realized that I have already heard you, you would stop repeating yourself.
That you will begin listening to my claim that not all discounting is instrumental is more than I can hope for, since you seem to think that my claim is refuted each time you provide an example of what you imagine to be a kind of discounting that can be interpreted as instrumental.
That you will begin listening to my claim that not all discounting is instrumental is more than I can hope for, since you seem to think that my claim is refuted each time you provide an example of what you imagine to be a kind of discounting that can be interpreted as instrumental.
I am pretty sure that I just told you that I do not think that all discounting is instrumental. Here’s what I said:
I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental.
To clarify: I do not think the only justification for discounting is instrumental. My position is more like: agents can have whatever utility functions they like (including ones with temporal discounting) without having to justify them to anyone.
Agents can have many kinds of utility function! That is partly a consequence of there being so many different ways for agents to go wrong.
Being rational isn’t about your values, you can rationally pursue practially any goal. Epistemic rationality is a bit different—but I mosly ignore that as being unbiological.
Being moral isn’t really much of a constraint at all. Morality—and right and wrong—are normally with respect to a moral system—and unless a moral system is clearly specified, you can often argue all day about what is moral and what isn’t. Maybe some types of morality are more common than others—due to being favoured by the universe, or something like that—but any such context would need to be made plain in the discussion.
So, it seems (relatively) easy to make a temporal discounting agent that really values the present over the future—just stick a term for that in its ultimate values.
Are there any animals with ultimate temporal discounting? That is tricky, but it isn’t difficult to imagine natural selection hacking together animals that way. So: probably, yes.
Do I use ultimate temporal discounting? Not noticably—as far as I can tell. I care about the present more than the future, but my temporal discounting all looks instrumental to me. I don’t go in much for thinking about saving distant galaxies, though! I hope that further clarifies.
I should probably review around about now. Instead of that: IIRC, you want to wire temporal discounting into machines, so their preferences better match your own—whereas I tend to think that would be giving them your own nasty hangover.
The real world surely contains problems of both sorts.
Perhaps the real world does. But decision theory doesn’t. The conventional assumption is that a rational agent is logically omniscient. And generalizing decision theory by relaxing that assumption looks like it will be a very difficult problem.
Programs make good models. If you can program it, you have a model of it. We can actually program agents that make resource-limited decisions. Having an actual program that makes decisions is a pretty good way of modeling making resource-limited decisions.
Perhaps we have some kind of underlying disagreement about what it means for temporal discounting to be “instrumental”.
In your example of an agent with suffering from risk of death, my thinking is: this player might opt for a safer life—with reduced risk. Or they might choose to lead a more interesting but more risky life. Their degree of discounting may well adjust itself accordingly—and if so, I would take that as evidence that their discounting was not really part of their pure preferences, but rather was an instrumental and dynamic response to the observed risk of dying.
If—on the other hand—they adjusted the risk level of their lifestyle, and their level of temporal discounting remained unchanged, that would be cofirming evidence in favour of the hypothesis that their temporal discounting was an innate part of their ultimate preferences—and not instrumental.
Of course. My point is that observing if the discount rate changes with the risk tells you if the agent is rational or irrational, not if the discount rate is all instrumental or partially terminal.
Stepping back for a moment, terminal values represent what the agent really wants, and instrumental values are things sought en-route.
The idea I was trying to express was: if what an agent really wants is not temporally discounted, then instrumental temporal discounting will produce a predictable temporal discounting curve—caused by aging, mortality risk, uncertainty, etc.
Deviations from that curve would indicate the presence of terminal temporal discounting.
I have no disagreement at all with your analysis here. This is not fundamental discounting. And if you have decision alternatives which affect the chances of dying, then it doesn’t even work to model it as if it were fundamental.
You recently mentioned the possibility of dying in the interim. There’s also the possibility of aging in the interim. Such factors can affect utility calculations.
For example: I would much rather have my grandmother’s inheritance now than years down the line, when she finally falls over one last time—because I am younger and fitter now.
Significant temporal discounting makes sense sometimes—for example, if there is a substantial chance of extinction per unit time. I do think a lot of discounting is instrumental, though—rather than being a reflection of ultimate values—due to things like the future being expensive to predict and hard to influence.
My brain spends more time thinking about tomorrow than about this time next year—because I am more confident about what is going on tomorrow, and am better placed to influence it by developing cached actions, etc. Next year will be important too—but there will be a day before to allow me to prepare for it closer to the time, when I am better placed to do so. The difference is not because I will be older then—or because I might die in the mean time. It is due to instrumental factors.
Of course one reason this is of interest is because we want to know what values to program into a superintelligence. That superintelligence will probably not age—and will stand a relatively low chance of extinction per unit time. I figure its ultimate utility function should have very little temporal discounting.
The problem with wiring discount functions into the agent’s ultimate utility function is that that is what you want it to preserve as it self improves. Much discounting is actually due to resource limitation issues. It makes sense for such discounting to be dynamically reduced as more resources become cheaply available. It doesn’t make much sense to wire-in short-sightedness.
I don’t mind tree-pruning algorithms attempting to normalise partial evaluations at different times—so they are more directly comparable to each other. The process should not get too expensive, though—the point of tree pruning is that it is an economy measure.
Find a path that allows people to feel that they have given their informed consent to both the project and the timetable—anything else is morally wrong.
I suspect you want to replace “feel like they have given” with “give.”
Unless you are actually claiming that what is immoral is to make people fail to feel consulted, rather than to fail to consult them, which doesn’t sound like what you’re saying.
Find a path that allows people to feel that they have given their informed consent to both the project and the timetable—anything else is morally wrong.
I suspect you want to replace “feel like they have given” with “give.”
I think I will go with a simple tense change: “feel that they are giving”. Assent is far more important in the lead-up to the Singularity than during the aftermath.
Although I used the language “morally wrong”, my reason for that was mostly to make the rhetorical construction parallel. My preference for an open, inclusive process is a strong preference, but it is really more political/practical than moral/idealistic. One ought to allow the horses to approach the trough of political participation, if only to avoid being trampled, but one is not morally required to teach them how to drink.
Ah, I see. Sure, if you don’t mean morally wrong but rather politically impractical, then I withdraw my suggestion… I entirely misunderstood your point.
No, I did originally say (and mostly mean) “morally” rather than “politically”. And I should thank you for inducing me to climb down from that high horse.
But he retains a host of other (mis)conceptions about meta-ethics which make his intentions abhorrent to people with different (mis)conceptions.
I submit that I have many of the same misconceptions that Eliezer does; he changed his mind about one of the few places I disagree with him. That makes it far more of a change than it would be for you (one out of eight is a small portion, one out of a thousand is an invisible fraction).
Good point. And since ‘scary’ is very much a subjective judgment, that mean that I can’t validly criticize you for being foolish unless I have some way of arguing that yours and Eliezer’s positions in the realm of meta-ethics are misconceptions—something I don’t claim to be able to do.
So, if I wish my criticisms to be objective, I need to modify them. Eliezer’s expressed positions on meta-ethics (particularly his apparent acceptance of act-utilitarianism and his unwillingness to discount future utilities) together with some of his beliefs regarding the future (particularly his belief in the likelihood of a positive singularity and expansion of human population into the universe) make his ethical judgments completely unpredictable to many other people—unpredictable because the judgment may turn on subtle differences in the expect consequences of present day actions on people in the distant future. And, if one considers the moral judgments of another personal to be unpredictable, and that person is powerful, then one ought to consider that person scary. Eliezer is probably scary to many people.
True, but it has little bearing on whether Eliezer should be scary. That is, “Eliezer is scary to many people” is mostly a fact about many people, and mostly not a fact about Eliezer. The reverse of this (and what I base this distinction on) is that some politicians should be scary, and are not scary to many people.
I’m not sure the proposed modification helps: you seem to have expanded your criticism so far, in order to have them lead to the judgment you want to reach, that they cover too much.
I mean, sure, unpredictability is scarier (for a given level of power) than predictability. Agreed, But so what?
For example, my judgments will always be more unpredictable to people much stupider than I am than to people about as smart or smarter than I am. So the smarter I am, the scarier I am (again, given fixed power)… or, rather, the more people I am scary to… as long as I’m not actively devoting effort to alleviating those fears by, for example, publicly conforming to current fashions of thought. Agreed.
But what follows from that? That I should be less smart? That I should conform more? That I actually represent a danger to more people? I can’t see why I should believe any of those things.
You started out talking about what makes one dangerous; you have ended up talking about what makes people scared of one whether one is dangerous or not. They aren’t equivalent.
you seem to have expanded your criticism so far, in order to have them lead to the judgment you want to reach, that they cover too much.
Well, I hope I haven’t done that.
You started out talking about what makes one dangerous; you have ended up talking about what makes people scared of one whether one is dangerous or not.
Well, I certainly did that. I was trying to address the question more objectively, but it seems I failed. Let me try again from a more subjective, personal position.
If you and I share the same consequentialist values, but I know that you are more intelligent, I may well consider you unpredictable, but I won’t consider you dangerous. I will be confident that your judgments, in pursuit of our shared values, will be at least as good as my own. Your actions may surprise me, but I will usually be pleasantly surprised.
If you and I are of the same intelligence, but we have different consequentialist values (both being egoists, with disjoint egos, for example) then we can expect to disagree on many actions. Expecting the disagreement, we can defend ourselves, or even bargain our way to a Nash bargaining solution in which (to the extent that we can enforce our bargain) we can predict each others behavior to be that promoting compromise consequences.
If, in addition to different values, we also have different beliefs, then bargaining is still possible, though we cannot expect to reach a Pareto optimal bargain. But the more our beliefs diverge, regarding consequences that concern us, the less good our bargains can be. In the limit, when the things that matter to us are particularly difficult to predict, and when we each have no idea what the other agent is predicting, bargaining simply becomes ineffective.
Eliezer has expressed his acceptance of the moral significance of the utility functions of people in the far distant future. Since he believes that those people outnumber us folk in the present, that seems to suggest that he would be willing to sacrifice the current utility of us in favor of the future utility of them. (For example, the positive value of saving a starving child today does not outweigh the negative consequences on the multitudes of the future of delaying the Singularity by one day).
I, on the other hand, systematically discount the future. That, by itself, does not make Eliezer dangerous to me. We could strike a Nash bargain, after all. However, we inevitably also have different beliefs about consequences, and the divergence between our beliefs becomes greater the farther into the future we look. And consequences in the distant future are essentially all that matters to people like Eliezer—the present fades into insignificance by contrast. But, to people like me, the present and near future are essentially all that matter—the distant future discounts into insignificance.
So, Eliezer and I care about different things. Eliezer has some ability to predict my actions because he knows I care about short-term consequences and he knows something about how I predict short-term consequences. But I have little ability to predict Eliezer’s actions, because I know he cares primarily about long term consequences, and they are inherently much more unpredictable. I really have very little justification for modeling Eliezer (and any other act utilitarian who refuses to discount the future) as a rational agent.
I really have very little justification for modeling Eliezer (and any other act utilitarian who refuses to discount the future) as a rational agent.
I wish you would just pretend that they care about things a million times further into the future than you do.
The reason is that there are instrumental reasons to discount—the future disappears into a fog of uncertainty—and you can’t make decisions based on the value of things you can’t forsee.
The instrumental reasons fairly quickly dominate as you look further out—even when you don’t discount in your values. Reading your post, it seems as though you don’t “get” this, or don’t agree with it—or something.
Yes, the far-future is unpredictable—but in decision theory, that tends to make it a uniform grey—not an unpredictable black and white strobing pattern.
I wish you would just pretend that they care about things a million times further into the future than you do.
I don’t need to pretend. Modulo some mathematical details, it is the simple truth. And I don’t think there is anything irrational about having such preferences. It is just that, since I cannot tell whether or not what I do will make such people happy, I have no motive to pay any attention to their preferences.
Yes, the far-future is unpredictable—but in decision theory, that tends to make it a uniform grey—not an unpredictable black and white strobing pattern.
Yet, it seems that the people who care about the future do not agree with you on that. Bostrom, Yudkowsky, Nesov, et al. frequently invoke assessments of far-future consequences (sometimes in distant galaxies) in justifying their recommendations.
I wish you would just pretend that they care about things a million times further into the future than you do.
I don’t need to pretend. Modulo some mathematical details, it is the simple truth.
We have crossed wires here. What I meant is that I wish you would stop protesting about infinite utilities—and how non-discounters are not really even rational agents—and just model them as ordinary agents who discount a lot less than you do.
Objections about infinity strike me as irrelevant and uninteresting.
It is just that, since I cannot tell whether or not what I do will make such people happy, I have no motive to pay any attention to their preferences.
Is that your true objection? I expect you can figure out what would make these people happy fairly easily enough most of the time—e.g. by asking them.
Yes, the far-future is unpredictable—but in decision theory, that tends to make it a uniform grey—not an unpredictable black and white strobing pattern.
Yet, it seems that the people who care about the future do not agree with you on that. Bostrom, Yudkowsky, Nesov, et al. frequently invoke assessments of far-future consequences (sometimes in distant galaxies) in justifying their recommendations.
Indeed. That is partly poetry, though (big numbers make things seem important) - and partly because they think that the far future will be highly contingent on near future events.
The thing they are actually interested in influencing is mostly only a decade or so out. It does seem quite important—significant enough to reach back to us here anyway.
If what you are trying to understand is far enough away to be difficult to predict, and very important, then that might cause some oscillations. That is hardly a common situation, though.
Most of the time, organisms act as though want to become ancestors. To do that,
the best thing they can do is focus on having some grandkids. Expanding their circle of care out a few generations usually makes precious little difference to their actions. The far future is unforseen, and usually can’t be directly influenced. It is usually not too relevant. Usually, you leave it to your kids to deal with.
It is just that, since I cannot tell whether or not what I do will make such people happy, I have no motive to pay any attention to their preferences.
Is that your true objection? I expect you can figure out what would make these people happy fairly easily enough most of the time—e.g. by asking them.
That is a valid point. So, I am justified in treating them as rational agents to the extent that I can engage in trade with them. I just can’t enter into a long-term Nash bargain with them in which we jointly pledge to maximize some linear combination of our two utility functions in an unsupervised fashion. They can’t trust me to do what they want, and I can’t trust them to judge their own utility as bounded.
I think this is back to the point about infinities. The one I wish you would stop bringing up—and instead treat these folk as though they are discounting only a teeny, tiny bit.
Frankly, I generally find it hard to take these utilitarian types seriously in the first place. A “signalling” theory (holier-than-thou) explains the unusually high prevalance of utilitarianism among moral philosophers—and an “exploitation” theory explains its prevalance among those running charitable causes (utilitarianism-says-give-us-your-money). Those explanations do a good job of modelling the facts about utilitarianism—and are normally a lot more credible than the supplied justifications—IMHO.
I think this is back to the point about infinities.
Which suggests that we are failing to communicate. I am not surprised.
The one I wish you would stop bringing up—and instead treat these folk as though they are discounting only a teeny, tiny bit.
I do that! And I still discover that their utility functions are dominated by huge positive and negative utilities in the distant future, while mine are dominated by modest positive and negative utilities in the near future. They are still wrong even if they fudge it so that their math works.
I think this is back to the point about infinities.
Which suggests that we are failing to communicate. I am not surprised.
I went from your “I can’t trust them to judge their own utility as bounded” to your earlier “infinity” point. Possibly I am not trying very hard here, though...
My main issue was you apparently thinking that you couldn’t predict their desires in order to find mutually beneficial trades. I’m not really sure if this business about not being able to agree to maximise some shared function is a big deal for you.
Mm. OK, so you are talking about scaring sufficiently intelligent rationalists, not scaring the general public. Fair enough.
What you say makes sense as far as it goes, assuming some mechanism for reliable judgments about people’s actual bases for their decisions. (For example, believing their self-reports.)
But it seems the question that should concern you is not whether Eliezer bases his decisions on predictable things, but rather whether Eliezer’s decisions are themselves predictable.
Put a different way: by your own account, the actual long-term consequences don’t correlate reliably with Eliezer’s expectations about them… that’s what it means for those consequences to be inherently unpredictable. And his decisions are based on his expectations, of course, not on the actual future consequences. So it seems to follow that once you know Eliezer’s beliefs about the future, whether those beliefs are right or wrong is irrelevant to you: that just affects what actually happens in the future, which you systematically discount anyway.
So if Eliezer is consistent in his beliefs about the future, and his decisions are consistently grounded in those beliefs, I’m not sure what makes him any less predictable to me than you are.
Of course, his expectations might not be consistent. Or they might be consistent but beyond your ability to predict. Or his decisions might be more arbitrary than you suggest here. For that matter, he might be lying outright. I’m not saying you should necessarily trust him, or anyone else.
But those same concerns apply to everybody, whatever their professed value structure. I would say the same things about myself.
So it seems to follow that once you know Eliezer’s beliefs about the future, whether those beliefs are right or wrong is irrelevant to you: that just affects what actually happens in the future, which you systematically discount anyway.
But Eliezer’s beliefs about the future continue to change—as he gains new information and completes new deductions. And there is no way that he can practically keep me informed of his beliefs—neither he nor I would be willing to invest the time required for that communication. But Eliezer’s beliefs about the future impact his actions in the present, and those actions have consequences both in the near and distant future. From my point of view, therefore, his actions have essentially random effects on the only thing that matters to me—the near future.
Absolutely. But who isn’t that true of? At least Eliezer has extensively documented his putative beliefs at various points in time, which gives you some data points to extrapolate from.
I have no complaints regarding the amount of information about Eliezer’s beliefs that I have access to. My complaint is that Eliezer, and his fellow non-discounting act utilitarians, are morally driven by the huge differences in utility which they see as arising from events in the distant future—events which I consider morally irrelevant because I discount the future. No realistic amount of information about beliefs can alleviate this problem. The only fix is for them to start discounting. (I would have added “or for me to stop discounting” except that I still don’t know how to handle the infinities.)
Given that they predominantly care about things I don’t care about, and that I predominantly care about things they don’t worry about, we can only consider each other to be moral monsters.
You and I seem to be talking past each other now. It may be time to shut this conversation down.
Given that they predominantly care about things I don’t care about, and that I predominantly care about things they don’t worry about, we can only consider each other to be moral monsters.
Ethical egoists are surely used to this situation, though. The world is full of people who care about extremely different things from one another.
Yes. And if they both mostly care about modest-sized predictable things, then they can do some rational bargaining. Trouble arises when one or more of them has exquisitely fragile values—when they believe that switching a donation from one charity to another destroys galaxies.
I expect your decision algorithm will find a way to deal with people who won’t negotiate on some topics—or who behave in manner you have a hard time predicting. Some trouble for you, maybe—but probably not THE END OF THE WORLD.
From my point of view, therefore, his actions have essentially random effects on the only thing that matters to me—the near future.
Looking at the last 10 years, there seems to be some highly-predictable fund raising activity, and a lot of philosophising about the importance of machine morality.
I see some significant patterns there. It is not remotely like a stream of random events. So: what gives?
Sure, the question of whether a superintelligence will construct a superior morality to that which natural selection and cultural evolution have constructed on Earth is in some sense a narrow technical question. (The related question of whether the phrase “superior morality” even means anything is, also.)
But it’s a technical question that pertains pretty directly to the question of whose side one envisions oneself on.
That is, if one answers “yes,” it can make sense to ally with the Singularity rather than humanity (assuming that even means anything) as EY-1998 claims to, and still expect some unspecified good (or perhaps Good) result. Whereas if one answers “no,” or if one rejects the very idea that there’s such a thing as a superior morality, that justification for alliance goes away.
That said, I basically agree with you, though perhaps for different reasons than yours.
That is, even after embracing the idea that no other values, even those held by a superintelligence, can be superior to human values, one is still left with the same choice of alliances. Instead of “side with humanity vs. the Singularity,” the question involves a much narrower subset: “side with humanity vs. FAI-induced Singularity,” but from our perspective it’s a choice among infinities.
Of course, advocates of FAI-induced Singularity will find themselves saying that there is no conflict, really, because an FAI-induced Singularity will express by definition what’s actually important about humanity. (Though, of course, there’s no guarantee that individual humans won’t all be completely horrified by the prospect.)
Though, of course, there’s no guarantee that individual humans won’t all be completely horrified by the prospect.
Recall that after CEV extrapolates current humans’ volitions and construes a coherent superposition, the next step isn’t “do everything that superposition says”, but rather, “ask that superposition the one question ‘Given the world as it is right now, what program should we run next?’, run that program, and then shut down”. I suppose it’s possible that our CEV will produce an AI that immediately does something we find horrifying, but I think our future selves are nicer than that… or could be nicer than that, if extrapolated the right way, so I’d consider it a failure of Friendliness if we get a “do something we’d currently find horrifying for the greater good” AI if a different extrapolation strategy would have resulted in something like a “start with the most agreeable and urgent stuff, and other than that, protect us while we grow up and give us help where we need it” AI.
I really doubt that we’d need an AI to do anything immediately horrifying to the human species in order to allow it to grow up into an awesome fun posthuman civilization, so if CEV 1.0 Beta 1 appeared to be going in that direction, that would probably be considered a bug and fixed.
(shrug) Sure, if you’re right that the “most urgent and agreeable stuff” doesn’t happen to press a significant number of people’s emotional buttons, then it follows that not many people’s emotional buttons will be pressed.
But there’s a big difference between assuming that this will be the case, and considering it a bug if it isn’t.
Either I trust the process we build more than I trust my personal judgments, or I don’t.
If I don’t, then why go through this whole rigamarole in the first place? I should prefer to implement my personal judgments. (Of course, I may not have the power to do so, and prefer to join more powerful coalitions whose judgments are close-enough to mine. But in that case CEV becomes a mere political compromise among the powerful.)
If I do, then it’s not clear to me that “fixing the bug” is a good idea.
That is, OK, suppose we write a seed AI intended to work out humanity’s collective CEV, work out some next-step goals based on that CEV and an understanding of likely consequences, construct a program P to implement those goals, run P, and quit.
Suppose that I am personally horrified by the results of running P. Ought I choose to abort P? Or ought I say to myself “Oh, how interesting: my near-mode emotional reactions to the implications of what humanity really wants are extremely negative. Still, most everybody else seems OK with it. OK, fine: this is not going to be a pleasant transition period for me, but my best guess is still that it will ultimately be for the best.”
Is there some number of people such that if more than that many people are horrified by the results, we ought to choose to abort P?
Does the question even matter? The process as you’ve described it doesn’t include an abort mechanism; whichever choice we make P is executed.
Ought we include such an abort mechanism? It’s not at all clear to me that we should. I can get on a roller-coaster or choose not to get on it, but giving me a brake pedal on a roller coaster is kind of ridiculous.
Sure, the question of whether a superintelligence will construct a superior morality to that which natural selection and cultural evolution have constructed on Earth is in some sense a narrow technical question.
Apparently he changed his mind about a bunch of things.
On what appears to be their current plan, the SIAI, don’t currently look very dangerous, IMHO.
Eray Ozkural recently complained: “I am also worried that backwards people and extremists will threaten us, and try to dissuade us from accomplishing our work, due to your scare tactics.”
I suppose that sort of thing is possible—but my guess is that they are mostly harmless.
(Parenthetical about how changing your mind, admitting you were wrong, oops, etc, is a good thing).
Yes, I agree. I don’t really believe that he only learnt how to disguise his true goals. But I’m curious if you would be satisfied with his word alone if he would be able to run a fooming AI next week only if you gave your OK?
He has; this is made abundantly clear in the Metaethics sequence and particularly the “coming of age” sequence. That passage appears to be a reflection of the big embarrassing mistake he talked about, when he thought that he knew nothing about true morality (se “Could Anything Be Right?”) and that a superintelligence with a sufficiently “unconstrained” goal system (or what he’d currently refer to as “a rock”) would necessarily discover the ultimate true morality, so that whatever this superintelligence ended up doing would necessarily be the right thing, whether that turned out to consist of giving everyone a volcano lair full of catgirls/boys or wiping out humanity and reshaping the galaxy for its own purposes.
Needless to say, that is not his view anymore; there isn’t even any “Us or Them” to speak of anymore. Friendly AIs aren’t (necessarily) people, and certainly won’t be a distinct race of people with their own goals and ambitions.
Yes, I’m not suggesting that he is just signaling all that he wrote in the sequences to persuade people to trust him. I’m just saying that when you consider what people are doing for much less than shaping the whole universe to their liking, one might consider some sort of public or third-party examination before anyone is allowed to launch a fooming AI.
It will probably never come to it anyway. Not because the SIAI is not going to succeed but if it told anyone that it is even close to implementing something like CEV then the whole might of the world would crush it (if the world didn’t turn rational until then). Because to say that you are going to run a fooming AI will be interpreted as trying to take over all power and rule the universe. I suppose this is also the most likely reason for the SIAI to fail. The idea is out and once people notice that fooming AI isn’t just science fiction they will do everything to stop anyone from either implementing one at all or to run their own before anyone else does. And who’ll be the first competitor to take out in the race to take over the universe? The SIAI of course, just search Google. I guess it would have been a better idea to make this a stealth project from day one. But that train has left.
Anyway, if the SIAI does succeed one can only hope that Yudkowsky is not Dr. Evil in disguise. But even that would still be better than a paperclip maximizer. I assign more utility to a universe adjusted to Yudkowsky’s volition (or the SIAI) than paperclips (I suppose even if that means I’ll not “like” what happens to me then).
I’m just saying that when you consider what people are doing for much less than shaping the whole universe to their liking, one might consider some sort of public or third-party examination before anyone is allowed to launch a fooming AI.
I don’t see who is going to enforce that. Probably nobody.
What we are fairly likely to see is open-source projects getting more limelight. It is hard to gather mindshare if your strategy is: trust the code to us. Relatively few programmers are likely to buy into such projects—unless you pay them to do so.
So you take him at his word that he’s working in your best interest. You don’t think it is necessary to supervise the SIAI while working towards friendly AI. But once they finished their work, ready to go, you are in favor of some sort of examination before they can implement it. Is that correct?
I don’t think human selfishness vs. public interest is much of a problem with FAI; everyone’s interests with respect to FAI are well correlated, and making an FAI which specifically favors its creator doesn’t give enough extra benefit over an FAI which treats everyone equally to justify the risks (that the extra term will be discovered, or that the extra term introduces a bug). Not even for a purely selfish creator; FAI scenarios just doesn’t leave enough room for improvement to motivate implementing something else.
On the matter of inspecting AIs before launch, however, I’m conflicted. On one hand, the risk of bugs is very serious, and the only way to mitigate it is to have lots of qualified people look at it closely. On the other hand, if the knowledge that a powerful AI was close to completion became public, it would be subject to meddling by various entities that don’t understand what they’re doing. and it would also become a major target for espionage by groups of questionable motives and sanity who might create UFAIs. These risks are difficult to balance, but I think secrecy is the safer choice, and should be the default.
If your first paragraph turns out to be true, does that change anything with respect to the problem of human and political irrationality? My worry is that even if there is only one rational solution that everyone should favor, how likely is it that people understand and accept this? That might be no problem given the current perception. If the possibility of fooming AI will still be ignored at the point it will be possible to implement friendliness (CEV etc.), then there will be no opposition. So some quick quantum leaps towards AGI will likely allow the SIAI to follow through on it. But my worry is that if the general public or governments notice this possibility and take it serious, it will turn into a political mess never seen before. The world would have to be dramatically different for the big powers to agree on something like CEV. I still think this is the most likely failure mode in case the SIAI succeeds in defining friendliness before someone else runs a fooming AI. Politics.
These risks are difficult to balance, but I think secrecy is the safer choice, and should be the default.
I agree. But is that still possible? After all we’re writing about it in public. Although to my knowledge the SIAI never suggested that it would actually create a fooming AI, only come up with a way to guarantee its friendliness. But what you said in your second paragraph would suggest that the SIAI would also have to implement friendliness or otherwise people will take advantage of it or simply mess it up.
Although to my knowledge the SIAI never suggested that it would actually create a fooming AI, only come up with a way to guarantee its friendliness.
This?
The Singularity Institute was founded on the theory that in order to get a Friendly artificial intelligence, someone has got to build one. So, we’re just going to have an organization whose mission is: build a Friendly AI. That’s us.”
You don’t think it is necessary to supervise the SIAI while working towards friendly AI. But once they finished their work, ready to go, you are in favor of some sort of examination before they can implement it.
Probably it would be easier to run the examination during the SIAI’s work, rather than after. Certainly it would save more lives. So, supervise them, so that your examination is faster and more thorough. I am not in favour of pausing the project, once complete, to examine it if it’s possible to examine in in operation.
I do not seek out examples to support my conclusion but to weaken your argument that one should trust Yudkowsky because of his previous output.
You shouldn’t seek to “weaken an argument”, you should seek what is the actual truth, and then maybe ways of communicating your understanding. (I believe that’s what you intended anyway, but think it’s better not to say it this way, as a protective measure against motivated cognition.)
I took wedrifid’s point as being that whether EY is right or not, the bad effect described happens. This is part of the lose-lose nature of the original problem (what to do about a post that hurt people).
I don’t think this rhetoric is applicable. Several very intelligent posters have deemed the idea dangerous; a very intelligent you deems it safe. You argue they are wrong because it is ‘obviously safe’.
Eliezer is perfectly correct to point out that, on the whole of it, ‘obviously it is safe’ just does not seem like strong enough evidence when it’s up against a handful of intelligent posters who appear to have strong convictions.
You argue they are wrong because it is ‘obviously safe’.
Pardon? I don’t believe I’ve said any such thing here or elsewhere. I could of course be mistaken—I’ve said a lot of things and don’t recall them all perfectly. But it seems rather unlikely that I did make that claim because it isn’t what I believe.
I should have known I wouldn’t get away with that, eh? I actually don’t know if you oppose the decision because you think the idea is safe, or because you think that censorship is wronger than the idea is dangerous, or whether you even oppose the decision at all and were merely pointing out appeals to authority. If you could fill me on the details, I could re-present the argument as it actually applies.
Thankyou, and yes I can see the point behind what you were actually trying to say. It just important to me that I am not misrepresented (even though you had no malicious intent).
There are obvious (well, at least theoretically deducible based on the kind of reasoning I tend to discuss or that used by harry!mor) reasons why it would be unwise to give a complete explanation of all my reasoning.
I will say that ‘censorship is wronger’ is definitely not the kind of thinking I would use. Indeed, I’ve given examples of things that I would definitely censor. Complete with LOTR satire if I recall. :)
I’m actually not sure if I understand your point. Either it is a round-about way of making it or I’m totally dense and the idea really is dangerous (or some third option).
It’s not that the idea is wrong and no one would believe it, it’s that the idea is wrong and when presented with with the explanation for why it’s wrong no one should believe it. In addition, it’s kind of important that people understand why it’s wrong. I’m sympathetic to people with different minds that might have adverse reactions to things I don’t but the solution to that is to warn them off, not censor the topics entirely.
Yes, the idea really is dangerous.
And for those who understand the idea, but not why it is wrong, nor the explanation of why it is wrong?
This is a politically reinforced heuristic that does not work for this problem.
Transparency is very important regarding people and organisations in powerful and unique positions. The way they act and what they claim in public is weak evidence in support of their honesty. To claim that they have to censor certain information in the name of the greater public good, and to fortify the decision based on their public reputation, does bear no evidence about their true objectives. The only way to solve this issue is by means of transparency.
Surely transparency might have negative consequences, but it mustn’t and can outweigh the potential risks from just believing that certain people are telling the truth and do not engage in deception to follow through on their true objectives.
There is also nothing that Yudkowsky has ever achieved that would sufficiently prove his superior intellect that would in turn justify people to just believe him about some extraordinary claim.
When I say something is a misapplied politically reinforced heuristic, you only reinforce my point by making fully general political arguments that it is always right.
Censorship is not the most evil thing in the universe. The consequences of transparency are allowed to be worse than censorship. Deal with it.
I already had Anna Salamon telling me something about politics. You sound as incomprehensible to me. Sorry, not meant as an attack.
I stated several times in the past that I am completely in favor of censorship, I have no idea why you are telling me this.
Our rules and intuitions about free speech and censorship are based on the types of censorship we usually see in practice. Ordinarily, if someone is trying to censor a piece of information, then that information falls into one of two categories: either it’s information that would weaken them politically, by making others less likely to support them and more likely to support their opponents, or it’s information that would enable people to do something that they don’t want done.
People often try to censor information that makes people less likely to support them, and more likely to support their opponents. For example, many governments try to censor embarrassing facts (“the Purple Party takes bribes and kicks puppies!”), the fact that opposition exists (“the Pink Party will stop the puppy-kicking!”) and its strength (“you can join the Pink Party, there are 10^4 of us already!”), and organization of opposition (“the Pink Party rally is tomorrow!”). This is most obvious with political parties, but it happens anywhere people feel like there are “sides”—with religions (censorship of “blasphemy”) and with public policies (censoring climate change studies, reports from the Iraq and Afghan wars). Allowing censorship in this category is bad because it enables corruption, and leaves less-worthy groups in charge.
The second common instance of censorship is encouragement and instructions for doing things that certain people don’t want done. Examples include cryptography, how to break DRM, pornography, and bomb-making recipes. Banning these is bad if the capability is suppressed for a bad reason (cryptography enables dissent), if it’s entangled with other things (general-purpose chemistry applies to explosives), or if it requires infrastructure that can also be used for the first type of censorship (porn filters have been caught blocking politicians’ campaign sites).
These two cases cover 99.99% of the things we call “censorship”, and within these two categories, censorship is definitely bad, and usually worth opposing. It is normally safe to assume that if something is being censored, it is for one of these two reasons. There are gray areas—slander (when the speaker knows he’s lying and has malicious intent), and bomb-making recipes (when they’re advertised as such and not general-purpose chemistry), for example—but the law has the exceptions mapped out pretty accurately. (Slander gets you sued, bomb-making recipes get you surveilled.) This makes a solid foundation for the principle that censorship should be opposed.
However, that principle and the analysis supporting it apply only to censorship that falls within these two domains. When things fall outside these categories, we usually don’t call them censorship; for example, there is a widespread conspiracy among email and web site administrators to suppress ads for Viagra, but we don’t call that censorship, even though it meets every aspect of the definition except motive. If you happen to find a weird instance of censorship which doesn’t fall into either category, then you have to start over and derive an answer to whether censorship in that particular case is good or bad, from scratch, without resorting to generalities about censorship-in-general. Some of the arguments may still apply—for example, building a censorship-technology infrastructure is bad even if it’s only meant to be used on spam—but not all of them, and not with the same force.
If the usual arguments against censorship don’t apply, and we’re trying to figure out whether to censor it, the next two things to test are whether it’s true, and whether an informed reader would want to see it. If both of these conditions hold, then it should not be censored. However, if either condition fails to hold, then it’s okay to censor.
Either the forbidden post is false, in which case it does not deserve protection because it’s false, or it’s true, in which case it should be censored because no informed person should want to see it. In either case, people spreading it are doing a bad thing.
Even if this is right the censorship extends to perhaps true conversations about why the post is false. Moreover, I don’t see what truth has to do with it. There are plenty of false claims made on this site that nonetheless should be public because understanding why they’re false and how someone might come to think that they are true are worthwhile endeavors.
The question here is rather straight forward: does the harm of the censorship outweigh the harm of letting people talk about the post. I can understand how you might initially think those who disagree with you are just responding to knee-jerk anti-censorship instincts that aren’t necessarily valid here. But from where I stand the arguments made by those who disagree with you do not fit this pattern. I think XiXi has been clear in the past about why the transparency concern does apply to SIAI. We’ve also seen arguments for why censorship in this particular case is a bad idea.
There are clearly more than two options here. There seem to be two points under contention:
It is/is not (1/2) reasonable to agree with the forbidden post.
It is/is not (3/4) desirable to know the contents of the forbidden post.
You seem to be restricting us to either 2+3 or 1+4. It seems that 1+3 is plausible (should we keep children from ever knowing about death because it’ll upset them?), and 2+4 seems like a good argument for restriction of knowledge (the idea is costly until you work through it, and the benefits gained from reaching the other side are lower than the costs).
But I personally suspect 2+3 is the best description, and that doesn’t explain why people trying to spread it are doing a bad thing. Should we delete posts on Pascal’s Wager because someone might believe it?
Excluded middle, of course: incorrect criterion. (Was this intended as a test?) It would not deserve protection if it were useless (like spam), not “if it were false.”
The reason I consider sufficient to keep it off LessWrong is that it actually hurt actual people. That’s pretty convincing to me. I wouldn’t expunge it from the Internet (though I might put a warning label on it), but from LW? Appropriate. Reposting it here? Rude.
Unfortunately, that’s also an argument as to why it needs serious thought applied to it, because if the results of decompartmentalised thinking can lead there, humans need to be able to handle them. As Vaniver pointed out, there are previous historical texts that have had similar effects. Rationalists need to be able to cope with such things, as they have learnt to cope with previous conceptual basilisks. So it’s legitimate LessWrong material at the same time as being inappropriate for here. Tricky one.
(To the ends of that “compartmentalisation” link, by the way, I’m interested in past examples of basilisks and other motifs of harmful sensation in idea form. Yes, I have the deleted Wikipedia article.)
Note that I personally found the idea itself silly at best.
The assertion that if a statement is not true, fails to alter political support, fails to provide instruction, and an informed reader wants to see that statement, it is therefore a bad thing to spread that statement and a OK thing to censor, is, um, far from uncontroversial.
To begin with, most fiction falls into this category. For that matter, so does most nonfiction, though at least in that case the authors generally don’t intend for it to be non-true.
No, you reversed a sign bit: it is okay to censor if an informed reader wouldn’t want to see it (and the rest of those conditions).
No, I don’t think so. You said “if either condition fails to hold, then it’s okay to censor.” If it isn’t true, and an informed reader wants to see it, then one of the two conditions failed to hold, and therefore it’s OK to censor.
No?
Oops, you’re right—one more condition is required. The condition I gave is only sufficient to show that it fails to fall into a protected class, not that it falls in the class of things that should be censored; there are things which fall in neither class (which aren’t normally censored because that requires someone with a motive to censor it, which usually puts it into one of the protected classes). To make it worthy of censorship, there must additionally be a reason outside the list of excluded reasons to censor it.
Your comment that I am replying too is often way more salient than things you have said in the past that I may or may not have observed.
I just have trouble understanding what you are saying. That might very well be my fault. I do not intent any hostile attack against you or the SIAI. I’m just curious, not worried at all. I do not demand anything. I’d like to learn more about you people, what you believe and how you arrived at your beliefs.
There is this particular case of the forbidden topic and I am throwing everything I got at it to see if the beliefs about it are consistent and hold water. That doesn’t mean that I am against censorship or that I believe it is wrong. I believe it is right but too unlikely (...). I believe that Yudkowsky and the SIAI are probably honest (although my gut feeling is to be very skeptic) but that there are good arguments for more transparency regarding the SIAI (if you believe it is as important as being portrayed). I believe that Yudkowsky is wrong about his risk estimation regarding the idea.
I just don’t understand your criticism of my past comments and that included telling me something about how I use politics (I don’t get it) and that I should accept that censorship sometimes is necessary (which I haven’t argued against).
You are just going to piss off the management.
IMO, it isn’t that interesting.
Yudkowsky apparently agrees that squashing it was handled badly.
Anyway, now Roko is out of self-imposed exile, I figure it is about time to let it drop.
The problem with that is that Eliezer and those who agree with him, including me, cannot speak freely about our reasoning on the issue, because we don’t want to spread the idea, so we don’t want to describe it and point to details about it as we describe our reasoning. If you imagine yourself in our position, believing the idea is dangerous, you could tell that you wouldn’t want to spread the idea in the process of explaining its danger either.
Under more normal circumstances, where the ideas we disagree about are not thought by anyone to be dangerous, we can have effective discussion by laying out our true reasons for our beliefs, and considering counter arguments that refer to the details of our arguments. Being cut off from our normal effective methods of discussion is stressful, at least for me.
I have been trying to persuade people who don’t know the details of the idea or don’t agree that it is dangerous that we do in fact have good reasons for believing it to be dangerous, or at least that this is likely enough that they should let it go. This is a slow process, as I think of ways to express my thoughts without revealing details of the dangerous idea, or explaining them to people who know but don’t understand those details. And this ends up involving talking to people who, because they don’t think the idea is dangerous and don’t take it seriously, express themselves faster and less carefully, and who have conflicting goals like learning or spreading the idea, or opposing censorship in general, or having judged for themselves the merits of censorship (from others just like them) in this case. This is also stressful.
I engage in this stressful topic, because I think it is important, both that people do not get hurt from learning about this idea, and that SIAI/Eliezer do not get dragged through mud for doing the right thing.
Sorry, but I am not here to help you get the full understanding you need to judge if the beliefs are consistent and hold water. As I have been saying, this is not a normal discussion. And seriously, you would be better of dropping it and finding something else to worry about. And if you think it is important, you can remember to track if SIAI/Eliezer/supporters like me engage in a pattern of making excuses to ban certain topics to protect some hidden agenda. But then please remember all the critical discussion that don’t get banned.
Note that this shouldn’t be possible other than through arguments from authority.
(I’ve just now formed a better intuitive picture of the reasons for danger of the idea, and saw some of the comments previously made unnecessarily revealing, where the additional detail didn’t actually serve the purpose of convincing people I communicated with, who lacked some of the prerequisites for being able to use that detail to understand the argument for danger, but would potentially gain (better) understanding of the idea. It does still sound silly to me, but maybe the lack of inferential stability of this conclusion should actually be felt this way—I expect that the idea will stop being dangerous in the following decades due to better understanding of decision theory.)
Does this theory of yours require that Eliezer Yudkowsky plus several other old-time Less Wrongians are holding the Idiot Ball and being really stupid about something that you can just see as obvious?
Now might be a good time to notice that you are confused.
Something to keep in mind when you reply to comments here is that you are the default leader of this community and its highest status member. This means comments that would be reasonably glib or slightly snarky from other posters can come off as threatening and condescending when made by you. They’re not really threatening but they can instill in their targets strong fight-or-flight responses. Perhaps this is because in the ancestral environment status challenges from group leaders were far more threatening to our ancestor’s livelihood than challenges from other group members. When you’re kicking out trolls it’s a sight to see, but when you’re rhetorically challenging honest interlocutors it’s probably counter-productive. I had to step away from the computer because I could tell that even if I was wrong the feelings this comment provoked weren’t going to let me admit it (and you weren’t even actually mean, just snobby).
As to your question, I don’t think my understanding of the idea requires anyone to be an idiot. In fact from what you’ve said I doubt we’re that far a part on the matter of how threatening the idea is. There may be implications I haven’t thought through that you have and there maybe general responses to implications I’ve thought of that you haven’t. I often have trouble telling how much intelligence I needed to get somewhere but I think I’ve applied a fair amount in this case. Where I think we probably diverge significantly is in our estimation of the cost of the censorship which I think is more than high enough to outweigh the risk of making Roko’s idea public. It is at least plausible that you are underestimating this cost due to biases resulting from you social position in this group and your organizational affiliation.
I’ll note that, as wedrifid suggested, your position also seems to assume that quite a few Less Wrongians are being really stupid and can’t see the obvious. Perhaps those who have expressed disagreement with your decision aren’t quite as old-time as those who have. And perhaps this is because we have not internalized important concepts or accessed important evidence required to see the danger in Roko’s idea. But it is also noteworthy that the people who have expressed disagreement have mostly been outside the Yudkowsky/SIAI cluster relative to those who have agreed with you. This suggests that they might be less susceptible to the biases that may be affecting your estimation of the cost of the censorship.
I am a bit confused as I’m not totally sure the explanations I’ve thought of or seen posted for your actions sufficiently explain them- but that’s just the kind of uncertainty one always expects in disagreements. Are you not confused? If I didn’t think there was a downside to the censorship I would let it go. But I think the downside is huge, in particular I think the censorship makes it much harder to get more people to take Friendliness seriously as a scholarly field by people beyond the SIAI circle. I’m not sure you’re humble enough to care about that (that isn’t meant as a character attack btw). It makes the field look like a joke and makes its leading scholar look ridiculous. I’m not sure you have the political talents to recognize that. It also slightly increases the chances of someone not recognizing this failure mode (the one in Roko’s post) when it counts. I think you might be so sure (or so focused on the possibility that) you’re going to be the one flipping the switch in that situation that you aren’t worried enough about that.
Repeating “But I say so!” with increasing emphasis until it works. Been taking debating lessons from Robin?
It seems to me that the natural effect of a group leader persistently arguing from his own authority is Evaporative Cooling of Group Beliefs. This is of course conducive to confirmation bias and corresponding epistemological skewing for the leader; things which seem undesirable for somebody in Eliezer’s position. I really wish that Eliezer was receptive to taking this consideration seriously.
The thing is he usually does. That is one thing that has in the past set Eliezer apart from Robin and impressed me about Eliezer. Now it is almost as though he has embraced the evaporative cooling concept as an opportunity instead of a risk and gone and bought himself a blowtorch to force the issue!
Huh, so there was a change? Curious. Certainly looking over some of Eliezer’s past writings there are some that I identify with a great deal.
Far be it from me to be anything but an optimist. I’m going with ‘exceptions’. :)
Maybe, given the credibility he has accumulated on all these other topics, you should be willing to trust him on the one issue on which he is asserting this authority and on which it is clear that if he is right, it would be bad to discuss his reasoning.
The well known (and empirically verified) weakness in experts of the human variety is that they tend to be systematically overconfident when it comes to judgements that fall outside their area of exceptional performance—particularly when the topic is one just outside the fringes.
When it comes to blogging about theoretical issues of rationality Eliezer is undeniably brilliant. Yet his credibility specifically when it comes to responding to risks is rather less outstanding. In my observation he reacts emotionally and starts making rookie mistakes of rational thought and action. To the point when I’ve very nearly responded ‘Go read the sequences!’ before remembering that he was the flipping author and so should already know better.
Also important is the fact that elements of the decision are about people, not game theory. Eliezer hopefully doesn’t claim to be an expert when it comes to predicting or eliciting optimal reactions in others.
We were talking about his credibility in judging whether this idea is a risk, and that is within his area of expertise.
Was it not clear that I do not assign particular credence to Eliezer when it comes to judging risks? I thought I expressed that with considerable emphasis.
I’m aware that you disagree with my conclusions—and perhaps even my premises—but I can assure you that I’m speaking directly to the topic.
I do not consider this strong evidence as there are many highly intelligent and productive people who hold crazy beliefs:
Francisco J. Ayala who “…has been called the “Renaissance Man of Evolutionary Biology” is a geneticist ordained as a Dominican priest. “His “discoveries have opened up new approaches to the prevention and treatment of diseases that affect hundreds of millions of individuals worldwide…”
Francis Collins (geneticist, Human Genome Project) noted for his landmark discoveries of disease genes and his leadership of the Human Genome Project (HGP) and described by the Endocrine Society as “one of the most accomplished scientists of our time” is a evangelical Christian.
Peter Duesberg (a professor of molecular and cell biology at the University of California, Berkeley) claimed that AIDS is not caused by HIV, which made him so unpopular that his colleagues and others have — until recently — been ignoring his potentially breakthrough work on the causes of cancer.
Georges Lemaître (a Belgian Roman Catholic priest) proposed what became known as the Big Bang theory of the origin of the Universe.
Kurt Gödel (logician, mathematician and philosopher) who suffered from paranoia and believed in ghosts. “Gödel, by contrast, had a tendency toward paranoia. He believed in ghosts; he had a morbid dread of being poisoned by refrigerator gases; he refused to go out when certain distinguished mathematicians were in town, apparently out of concern that they might try to kill him.”
Mark Chu-Carroll (PhD Computer Scientist, works for Google as a Software Engineer) “If you’re religious like me, you might believe that there is some deity that created the Universe.” He is running one of my favorite blogs, Good Math, Bad Math, and writes a lot on debunking creationism and other crackpottery.
Nassim Taleb (the author of the 2007 book (completed 2010) The Black Swan) does believe: Can’t track reality with science and equations. Religion is not about belief. We were wiser before the Enlightenment, because we knew how to take knowledge from incomplete information, and now we live in a world of epistemic arrogance. Religious people have a way of dealing with ignorance, by saying “God knows”.
Kevin Kelly (editor) is a devout Christian. Writes pro science and technology essays.
William D. Phillips (Nobel Prize in Physics 1997) is a Methodist.
I could continue this list with people like Ted Kaczynski or Roger Penrose. I just wanted show that intelligence and rational conduct do not rule out the possibility of being wrong about some belief.
Taleb quote doesn’t qualify. (I won’t comment on others.)
I should have made more clearly that it is not my intention to indicate that I believe that those people, or crazy ideas in general, are wrong. But there are a lot of smart people out there who’ll advocate opposing ideas. Using their reputation of being highly intelligent to follow through on their ideas is in my opinion not a very good idea in itself. I could just believe Freeman Dyson that existing simulation models of climate contain too much error to reliably predict future trends. I could believe Peter Duesberg that HIV does not cause aids, after all he is a brilliant molecular biologist. But I just do not think that any amount of reputation is enough evidence to believe extraordinary claims uttered by such people. And in the case of Yudkowsky, there doesn’t even exist much reputation and no great achievements at all that would justify some strong belief in his infallibility. What there exists in Yudkowsky’s case seems to be strong emotional commitment. I just can’t tell if he is honest. If he really believes that he’s working on a policy for some future superhuman intelligence that will rule the universe, then I’m going to be very careful. Not because it is wrong, but because such beliefs imply huge payoffs. Not that I believe he is the disguised Dr. Evil, but can we be sure enough to just trust him with it? Censorship of certain ideas does bear more evidence against him as it does in favor of his honesty.
How extensively have you searched for experts who made correct predictions outside their fields of expertise? What would you expect to see if you just searched for experts making predictions outside their field of expertise and then determined if that prediction were correct? What if you limited your search to experts who had expressed the attitude Eliezer expressed in Outside the Laboratory?
“Rule out”? Seriously? What kind of evidence is it?
You extracted the “rule out” phrase from the sentence:
From within the common phrase ‘do not rule out the possibility’ no less!
You then make a reference to ‘0 and 1s not probabilities’ with exaggerated incredulity.
To put it mildly this struck me as logically rude and in general poor form. XiXiDu deserves more courtesy.
None of this affects my point that ruling out the possibility is the wrong, (in fact impossible), standard.
Not exaggerated. XiXiDu’s post did seem to be saying: here are these examples of experts being wrong so it is possible that an expert is wrong in this case, without saying anything useful about how probable it is for this particular expert to be wrong on this particular issue.
You have made an argument accusing me of logical rudeness that, quite frankly, does not stand up to scrutiny.
-
Better evidence than I’ve ever seen in support of the censored idea. I have these well-founded principles, free speech and transparency, and weigh them against the evidence I have in favor of censoring the idea. That evidence is merely 1.) Yudkowsky’s past achievements, 2.) his output and 3.) intelligence. That intelligent people have been and are wrong about certain ideas while still being productive and right about many other ideas is evidence to weaken #3. That people lie and deceive to get what they want is evidence against #1 and #2 and in favor of transparency and free speech, which are both already more likely to have a positive impact than the forbidden topic is to have a negative impact.
And what are you trying to tell me with this link? I haven’t seen anyone stating numeric probability estimations regarding the forbidden topic. And I won’t state one either, I’ll just say that it is subjectively improbable enough to ignore it because there are possible too many very-very-low-probability events to take into account (for every being that will harm me if I don’t do X there is another being that will harm me if I do X, which cancel out each other). But if you’d like to pull some number out of thin air, go ahead. I won’t because I don’t have enough data to even calculate the probability of AI going FOOM versus a slow development.
You have failed to address my criticisms of you points, that you are seeking out only examples that support your desired conclusion, and that you are ignoring details that would allow you to construct a narrower, more relevant reference class for your outside view argument.
I was telling you the “ruling out the possibility” is the wrong, (in fact impossible), standard.
Only now I understand your criticism. I do not seek out examples to support my conclusion but to weaken your argument that one should trust Yudkowsky because of his previous output. I’m aware that Yudkowsky can very well be right about the idea but do in fact believe that the risk is worth taking. Have I done extensive research on how often people in similar situations have been wrong? Nope. No excuses here, but do you think there are comparable cases of predictions that proved to be reliable? And how much research have you done in this case and about the idea in general?
I don’t, I actually stated a few times that I do not think that the idea is wrong.
Seeking out just examples that weaken my argument, when I never predicted that no such examples would exist, is the problem I am talking about.
What made you think that supporting your conclusion and weakening my argument are different things?
My reason to weaken your argument is not that I want to be right but that I want feedback about my doubts. I said that 1.) people can be wrong, regardless of their previous reputation, 2.) that people can lie about their objectives and deceive by how they act in public (especially when the stakes are high), 3.) that Yudkowsky’s previous output and achievements are not remarkable enough to trust him about some extraordinary claim. You haven’t responded on why you tell people to believe Yudkowsky, in this case, regardless of my objections.
I’m sorry if I made it appear as if I hold some particular belief. My epistemic state simply doesn’t allow me to arrive at your conclusion. To highlight this I argued in favor of what it would mean to not accept your argument, namely to stand to previously well-established concepts like free speech and transparency. Yes, you could say that there is no difference here, except that I do not care about who is right but what is the right thing to do.
Still, it’s incorrect to argue from existence of examples. You have to argue from likelihood. You’d expect more correctness from a person with reputation for being right than from a person with reputation for being wrong.
People can also go crazy, regardless of their previous reputation, but it’s improbable, and not an adequate argument for their craziness.
And you need to know what fact you are trying to convince people about, not just search for soldier-arguments pointing in the preferred direction. If you believe that the fact is that a person is crazy, you too have to recognize that “people can be crazy” is inadequate argument for this fact you wish to communicate, and that you shouldn’t name this argument in good faith.
(Craziness is introduced as a less-likely condition than wrongness to stress the structure of my argument, not to suggest that wrongness is as unlikely.)
I notice that Yudkowsky wasn’t always self-professed human-friendly. Consider this:
http://hanson.gmu.edu/vc.html#yudkowsky
Wow. That is scary. Do you have an estimated date on that bizarre declaration? Pre 2004 I assume?
He’s changed his mind since. That makes it far, far less scary.
(Parenthetical about how changing your mind, admitting you were wrong, oops, etc, is a good thing).
(Hence reference to Eliezier2004 sequence.)
He has changed his mind about one technical point in meta-ethics. He now realizes that super-human intelligence does not automatically lead to super-human morality. He is now (IMHO) less wrong. But he retains a host of other (mis)conceptions about meta-ethics which make his intentions abhorrent to people with different (mis)conceptions. And he retains the arrogance that would make him dangerous to those he disagrees with, if he were powerful.
″… far, far less scary”? You are engaging in wishful thinking no less foolish than that for which Eliezer has now repented.
I’m not at all sure that I agree with Eliezer about most meta-ethics, and definitely disagree on some fairly important issues. But, that doesn’t make his views necessarily abhorrent. If Eliezer triggers a positive Singularity (positive in the sense that it reflects what he wants out of a Singularity, complete with CEV), I suspect that that will be a universe which I won’t mind living in. People can disagree about very basic issues and still not hate each others’ intentions. They can even disagree about long-term goals and not hate it if the other person’s goals are implemented.
Have you ever have one of those arguments with your SO in which:
It is conceded that your intentions were good.
It is conceded that the results seem good.
The SO is still pissed because of the lack of consultation and/or presence of extrapolation?
I usually escape those confrontations by promising to consult and/or not extrapolate the next time. In your scenario, Eliezer won’t have that option.
When people point out that Eliezer’s math is broken because his undiscounted future utilities leads to unbounded utility, his response is something like “Find better math—discounted utility is morally wrong”.
When Eliezer suggests that there is no path to a positive singularity which allows for prior consultation with the bulk of mankind, my response is something like “Look harder. Find a path that allows people to feel that they have given their informed consent to both the project and the timetable—anything else is morally wrong.”
ETA: In fact, I would like to see it as a constraint on the meaning of the word “Friendly” that it must not only provide friendly consequences, but also, it must be brought into existence in a friendly way. I suspect that this is one of those problems in which the added constraint actually makes the solution easier to find.
Could you link to where Eliezer says that future utilities should not be discounted? I find that surprising, since uncertainty causes an effect roughly equivalent to discounting.
I would also like to point out that achieving public consensus about whether to launch an AI would take months or years, and that during that time, not only is there a high risk of unfriendly AIs, it is also guaranteed that millions of people will die. Making people feel like they were involved in the decision is emphatically not worth the cost
He makes the case in this posting. It is a pretty good posting, by the way, in which he also points out some kinds of discounting which he believes are justified. This posting does not purport to be a knock-down argument against discounting future utility—it merely states Eliezer’s reasons for remaining unconvinced that you should discount (and hence for remaining in disagreement with most economic thinkers).
ETA: One economic thinker who disagrees with Eliezer is Robin Hanson. His response to Eliezer’s posting is also well worth reading.
Examples of Eliezer conducting utilitarian reasoning about the future without discounting are legion.
Tim Tyler makes the same assertion about the effects of uncertainty. He backs the assertion with metaphor, but I have yet to see a worked example of the math. Can you provide one?
Of course, one obvious related phenomenon—it is even mentioned with respect in Eliezer’s posting—is that the value of a promise must be discounted with time due to the increasing risk of non-performance: my promise to scratch your back tomorrow is more valuable to you than my promise to scratch next week—simply because there is a risk that you or I will die in the interim, rendering the promise worthless. But I don’t see how other forms of increased uncertainty about the future should have the same (exponential decay) response curve.
So, start now.
Most tree-pruning heuristics naturally cause an effect like temporal discounting. Resource limits mean that you can’t calculate the whole future tree—so you have to prune. Pruning normally means applying some kind of evaluation function early—to decide which branches to prune. The more you evaluate early, the more you are effectively valuing the near-present.
That is not maths—but hopefully it has a bit more detail than previously.
It doesn’t really address the question. In the A* algorithm the heuristic estimates of the objective function are supposed to be upper bounds on utility, not lower bounds. Furthermore, they are supposed to actually estimate the result of the complete computation—not to represent a partial computation exactly.
Reality check: a tree of possible futures is pruned at points before the future is completely calculated. Of course it would be nice to apply an evaluation function which represents the results of considering all possible future branches from that point on. However, getting one of those that produces results in a reasonable time would be a major miracle.
If you look at things like chess algorithms, they do some things to get a more accurate utility valuation when pruning—such as check for quiescence. However, they basically just employ a standard evaluation at that point—or sometimes a faster, cheaper approximation. If is sufficiently bad, the tree gets pruned.
We are living in the same reality. But the heuristic evaluation function still needs to be an estimate of the complete computation, rather than being something else entirely. If you want to estimate your own accumulation of pleasure over a lifetime, you cannot get an estimate of that by simply calculating the accumulation of pleasure over a shorter period—otherwise no one would undertake the pain of schooling motivated by the anticipated pleasure of high future income.
The question which divides us is whether an extra 10 utils now is better or worse than an additional 11 utils 20 years from now. You claim that it is worse. Period. I claim that it may well be better, depending on the discount rate.
Correct me if I’m missing an important nuance, but isn’t this just about whether one’s utils are timeless?
I’m not sure I understand the question. What does it mean for a util to be ‘timeless’?
ETA: The question of the interaction of utility and time is a confusing one. In “Against Discount Rates”, Eliezer writes:
I think that Eliezer has expressed the issue in almost, but not quite, the right way. The right question is whether a decision maker in 2007 should be 5% more interested in doing something about the 2008 issue than about the 2009 issue. I believe that she should be. If only because she expects that she will have an entire year in the future to worry about the 2009 family without the need to even consider 2008 again. 2008′s water will be already under the bridge.
I’m sure someone else can explain this better than me, but: As I understand it, a util understood timelessly (rather than like money, which there are valid reasons to discount because it can be invested, lost, revalued, etc. over time) builds into how it’s counted all preferences, including preferences that interact with time. If you get 10 utils, you get 10 utils, full stop. These aren’t delivered to your door in a plain brown wrapper such that you can put them in an interest-bearing account. They’re improvements in the four-dimensional state of the entire universe over all time, that you value at 10 utils. If you get 11 utils, you get 11 utils, and it doesn’t really matter when you get them. Sure, if you get them 20 years from now, then they don’t cover specific events over the next 20 years that could stand improvement. But it’s still worth eleven utils, not ten. If you value things that happen in the next 20 years more highly than things that happen later, then utils according to your utility function will reflect that, that’s all.
That (timeless utils) is a perfectly sensible convention about what utility ought to mean. But, having adopted that convention, we are left with (at least) two questions:
Do I (in 2011) derive a few percent more utility from an African family having clean water in 2012 than I do from an equivalent family having clean water in 2013?
If I do derive more utility from the first alternative, am I making a moral error in having a utility function that acts that way?
I would answer yes to the first question. As I understand it, Eliezer would answer yes to the second question and would answer no to the first, were he in my shoes. I would claim that Eliezer is making a moral error in both judgments.
Do you (in the years 2011, 2012, 2013, 2014) derive different relative utilities for these conditions? If so, it seems you have a problem.
I’m sorry. I don’t know what is meant by utility derived in 2014 from an event in 2012. I understand that the whole point of my assigning utilities in 2014 is to guide myself in making decisions in 2014. But no decision I make in 2014 can have an effect on events in 2012. So, from a decision-theoretic viewpoint, it doesn’t matter how I evaluate the utilities of past events. They are additive constants (same in all decision branches) in any computation of utility, and hence are irrelevant.
Or did you mean to ask about different relative utilities in the years before 2012? Yes, I understand that if I don’t use exponential discounting, then I risk inconsistencies.
And that is a fact about 2007 decision maker, not 2008 family’s value as compared to 2009 family.
If, in 2007, you present me with a choice of clean water for a family for all of and only 2008 vs 2009, and you further assure me that these families will otherwise survive in hardship, and that their suffering in one year won’t materially affect their next year, and that I won’t have this opportunity again come this time next year, and that flow-on or snowball effects which benefit from an early start are not a factor here—then I would be indifferent to the choice.
If I would not be; if there is something intrinsic about earlier times that makes them more valuable, and not just a heuristic of preferring them for snowballing or flow-on reasons, then that is what Eliezer is saying seems wrong.
I would classify that as instrumental discounting. I don’t think anyone would argue with that—except maybe a superintelligence who has already exhausted the whole game tree—and for whom an extra year buys nothing.
Given that you also believe that distributing your charitable giving over many charities is ‘risk management’, I suppose that should not surprise me.
FWIW, I genuinely don’t understand your perspective. The extent to which you discount the future depends on your chances of enjoying it—but also on factors like your ability to predict it—and your ability to influence it—the latter are functions of your abilities, of what you are trying to predict and of the current circumstances.
You really, really do not normally want to put those sorts of things into an agent’s utility function. You really, really do want to calculate them dynamically, depending on the agent’s current circumstances, prediction ability levels, actuator power levels, previous experience, etc.
Attempts to put that sort of thing into the utility function would normally tend to produce an inflexible agent, who has more difficulties in adapting and improving. Trying to incorporate all the dynamic learning needed to deal with the issue into the utility function might be possible in principle—but that represents a really bad idea.
Hopefully you can see my reasoning on this issue. I can’t see your reasoning, though. I can barely even imagine what it might possibly be.
Maybe you are thinking that all events have roughly the same level of unpredictability in the future, and there is roughly the same level of difficulty in influencing them, so the whole issue can be dealt with by one (or a small number of) temporal discounting “fudge factors”—and that evoution built us that way because it was too stupid to do any better.
You apparently denied that resource limitation results in temporal discounting. Maybe that is the problem (if so, see my other reply here). However, now you seem to have acknowledged that an extra year of time to worry in helps with developing plans. What I can see doesn’t seem to make very much sense.
I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental. So every time I offer a heuristic analogy explaining the motivation for fundamental discounting, you interpret it as a flawed argument for using discounting as a heuristic for instrumental reasons.
Since it appears that this will go on forever, and I don’t discount the future enough to make the sum of this projected infinite stream of disutility seem small, I really ought to give up. But somehow, my residual uncertainty about the future makes me think that you may eventually take Cromwell’s advice.
To clarify: I do not think the only justification for discounting is instrumental. My position is more like: agents can have whatever utility functions they like (including ones with temporal discounting) without having to justify them to anyone.
However, I do think there are some problems associated with temporal discounting. Temporal discounting sacrifices the future for the sake of the present. Sometimes the future can look after itself—but sacrificing the future is also something which can be taken too far.
Axelrod suggested that when the shadow of the future grows too short, more defections happen. If people don’t sufficiently value the future, reciprocal altruism breaks down. Things get especially bad when politicians fail to value the future. We should strive to arrange things so that the future doesn’t get discounted too much.
Instrumental temporal discounting doesn’t belong in ultimate utility functions. So, we should figure out what temporal discounting is instrumental and exclude it.
If we are building a potentially-immortal machine intelligence with a low chance of dying and which doesn’t age, those are more causes of temporal discounting which could be discarded as well.
What does that leave? Not very much, IMO. The machine will still have some finite chance of being hit by a large celestial body for a while. It might die—but its chances of dying vary over time; its degree of temporal discounting should vary in response—once again, you don’t wire this in, you let the agent figure it out dynamically.
The point is that resource limitation makes these estimates bad estimates—and you can’t do better by replacing them with better estimates because of … resource limitation!
To see how resource limitation leads to temporal discounting, consider computer chess. Powerful computers play reasonable games—but heavily resource limited ones fall for sacrifice plays, and fail to make successful sacrifice gambits. They often behave as though they are valuing short-term gain over long term results.
A peek under the hood quickly reveals why. They only bother looking at a tiny section of the game tree near to the current position! More powerful programs can afford to exhaustively search that space—and then move on to positions further out. Also the limited programs employ “cheap” evaluation functions that fail to fully compensate for their short-term foresight—since they must be able to be executed rapidly. The result is short-sighted chess programs.
That resource limitation leads to temporal discounting is a fairly simple and general principle which applies to all kinds of agents.
Why do you keep trying to argue against discounting using an example where discounting is inappropriate by definition? The objective in chess is to win. It doesn’t matter whether you win in 5 moves or 50 moves. There is no discounting. Looking at this example tells us nothing about whether we should discount future increments of utility in creating a utility function.
Instead, you need to look at questions like this: An agent plays go in a coffee shop. He has the choice of playing slowly, in which case the games each take an hour and he wins 70% of them. Or, he can play quickly, in which case the games each take 20 minutes, but he only wins 60% of them. As soon as one game finishes, another begins. The agent plans to keep playing go forever. He gains 1 util each time he wins and loses 1 util each time he loses.
The main decision he faces is whether he maximizes utility by playing slowly or quickly. Of course, he has infinite expected utility however he plays. You can redefine the objective to be maximizing utility flow per hour and still get a ‘rational’ solution. But this trick isn’t enough for the following extended problem:
The local professional offers go lessons. Lessons require a week of time away from the coffee-shop and a 50 util payment. But each week of lessons turns 1% of your losses into victories. Now the question is: Is it worth it to take lessons? How many weeks of lessons are optimal? The difficulty here is that we need to compare the values of a one-shot (50 utils plus a week not playing go) with the value of an eternal continuous flow (the extra fraction of games per hour which are victories rather than losses). But that is an infinite utility payoff from the lessons, and only a finite cost, right? Obviously, the right decision is to take a week of lessons. And then another week after that. And so on. Forever.
Discounting of future utility flows is the standard and obvious way of avoiding this kind of problem and paradox. But now let us see whether we can alter this example to capture your ‘instrumental discounting due to an uncertain future’:
First, the obvious one. Our hero expects to die someday, but doesn’t know when. He estimates a 5% chance of death every year. If he is lucky, he could live for another century. Or he could keel over tomorrow. And when he dies, the flow of utility from playing go ceases. It is very well known that this kind of uncertainty about the future is mathematically equivalent to discounted utility in a certain future. But you seemed to be suggesting something more like the following:
Our hero is no longer certain what his winning percentage will be in the future. He knows that he experiences microstrokes roughly every 6 months, and that each incident takes 5% of his wins and changes them to losses. On the other hand, he also knows that roughly every year he experiences a conceptual breakthrough. And that each such breakthrough takes 10% of his losses and turns them into victories.
Does this kind of uncertainty about the future justify discounting on ‘instrumental grounds’? My intuition says ’No, not in this case, but there are similar cases in which discounting would work.” I haven’t actually done the math, though, so I remain open to instruction.
Temporal discounting is about valuing something happening today more than the same thing happening tomorrow.
Chess computers do, in fact discount. That is why they do prefer to mate you in twenty moves rather than a hundred.
The values of a chess computer do not just tell it to win. In fact, they are complex—e.g. Deep Blue had an evaluation function that was split into 8,000 parts.
Operation consists of maximising the utility function, after foresight and tree pruning. Events that take place in branches after tree pruning has truncated them typically don’t get valued at all—since they are not forseen. Resource-limited chess computers can find themselves preferring to promote a pawn sooner rather than later. They do so since they fail to see the benefit of sequences leading to promotion later.
So: we apparently agree that resource limitation leads to indifference towards the future (due to not bothering to predict it) - but I classify this as a kind of temporal discounting (since rewards in the future get ignored), wheras you apparently don’t.
Hmm. It seems as though this has turned out to be a rather esoteric technical question about exactly which set of phenomena the term “temporal discounting” can be used to refer to.
Earlier we were talking about whether agents focussed their attention on tomorrow—rather than next year. Putting aside the issue of whether that is classified as being “temporal discounting”—or not—I think the extent to which agents focus on the near-future is partly a consequence of resource limitation. Give the agents greater abilities and more resources and they become more future-oriented.
No, I have not agreed to that. I disagree with almost every part of it.
In particular, I think that the question of whether (and how much) one cares about the future is completely prior to questions about deciding how to act so as to maximize the things one cares about. In fact, I thought you were emphatically making exactly this point on another branch.
But that is fundamental ‘indifference’ (which I thought we had agreed cannot flow from instrumental considerations). I suppose you must be talking about some kind of instrumental or ‘derived’ indifference. But I still disagree. One does not derive indifference from not bothering to predict—one instead derives not bothering to predict from being indifferent.
Furthermore, I don’t respond to expected computronium shortages by truncating my computations. Instead, I switch to an algorithm which produces less accurate computations at lower computronium costs.
And finally, regarding classification, you seem to suggest that you view truncation of the future as just one form of discounting, whereas I choose not to. And that this makes our disagreement a quibble over semantics. To which I can only reply: Please go away Tim.
I think you would reduce how far you look forward if you were interested in using your resources intelligently and efficiently.
If you only have a million cycles per second, you can’t realistically go 150 ply deep into your go game—no matter how much you care about the results after 150 moves. You compromise—limiting both depth and breadth. The reduction in depth inevitably means that you don’t look so far into the future.
A lot of our communication difficulty arises from using different models to guide our intuitions. You keep imagining game-tree evaluation in a game with perfect information (like chess or go). Yes, I understand your point that in this kind of problem, resource shortages are the only cause of uncertainty—that given infinite resources, there is no uncertainty.
I keep imagining problems in which probability is built in, like the coffee-shop-go-player which I sketched recently. In the basic problem, there is no difficulty in computing expected utilities deeper into the future—you solve analytically and then plug in whatever value for t that you want. Even in the more difficult case (with the microstrokes) you can probably come up with an analytic solution. My models just don’t have the property that uncertainty about the future arises from difficulty of computation.
Right. The real world surely contains problems of both sorts. If you have a problem which is dominated by chaos based on quantum events then more resources won’t help. Whereas with many other types of problems more resources do help.
I recognise the existence of problems where more resources don’t help—I figure you probably recognise that there are problems where more resources do help—e.g. the ones we want intelligent machines to help us with.
Perhaps the real world does. But decision theory doesn’t. The conventional assumption is that a rational agent is logically omniscient. And generalizing decision theory by relaxing that assumption looks like it will be a very difficult problem.
The most charitable interpretation I can make of your argument here is that human agents, being resource limited, imagine that they discount the future. That discounting is a heuristic introduced by evolution to compensate for those resource limitations. I also charitably assume that you are under the misapprehension that if I only understood the argument, I would agree with it. Because if you really realized that I have already heard you, you would stop repeating yourself.
That you will begin listening to my claim that not all discounting is instrumental is more than I can hope for, since you seem to think that my claim is refuted each time you provide an example of what you imagine to be a kind of discounting that can be interpreted as instrumental.
I repeat, Tim. Please go elsewhere.
I am pretty sure that I just told you that I do not think that all discounting is instrumental. Here’s what I said:
Agents can have many kinds of utility function! That is partly a consequence of there being so many different ways for agents to go wrong.
Thx for the correction. It appears I need to strengthen my claim.
Not all discounting by rational, moral agents is instrumental.
Are we back in disagreement now? :)
No, we aren’t. In my book:
Being rational isn’t about your values, you can rationally pursue practially any goal. Epistemic rationality is a bit different—but I mosly ignore that as being unbiological.
Being moral isn’t really much of a constraint at all. Morality—and right and wrong—are normally with respect to a moral system—and unless a moral system is clearly specified, you can often argue all day about what is moral and what isn’t. Maybe some types of morality are more common than others—due to being favoured by the universe, or something like that—but any such context would need to be made plain in the discussion.
So, it seems (relatively) easy to make a temporal discounting agent that really values the present over the future—just stick a term for that in its ultimate values.
Are there any animals with ultimate temporal discounting? That is tricky, but it isn’t difficult to imagine natural selection hacking together animals that way. So: probably, yes.
Do I use ultimate temporal discounting? Not noticably—as far as I can tell. I care about the present more than the future, but my temporal discounting all looks instrumental to me. I don’t go in much for thinking about saving distant galaxies, though! I hope that further clarifies.
I should probably review around about now. Instead of that: IIRC, you want to wire temporal discounting into machines, so their preferences better match your own—whereas I tend to think that would be giving them your own nasty hangover.
If you are not valuing my responses, I recommend you stop replying to them—thereby ending the discussion.
Programs make good models. If you can program it, you have a model of it. We can actually program agents that make resource-limited decisions. Having an actual program that makes decisions is a pretty good way of modeling making resource-limited decisions.
Perhaps we have some kind of underlying disagreement about what it means for temporal discounting to be “instrumental”.
In your example of an agent with suffering from risk of death, my thinking is: this player might opt for a safer life—with reduced risk. Or they might choose to lead a more interesting but more risky life. Their degree of discounting may well adjust itself accordingly—and if so, I would take that as evidence that their discounting was not really part of their pure preferences, but rather was an instrumental and dynamic response to the observed risk of dying.
If—on the other hand—they adjusted the risk level of their lifestyle, and their level of temporal discounting remained unchanged, that would be cofirming evidence in favour of the hypothesis that their temporal discounting was an innate part of their ultimate preferences—and not instrumental.
This bothers me since, with reasonable assumptions, all rational agents engage in the same amount of catastrophe discounting.
That is, observed discount rate = instrumental discount rate + chance of death + other factors
We should expect everyone’s discount rate to change, by the same amount, unless they’re irrational.
Agents do not all face the same risks, though.
Sure, they may discount the same amount if they do face the same risks, but often they don’t—e.g. compare the motorcycle racer with the nun.
So: the discounting rate is not fixed at so-much per year, but rather is a function of the agent’s observed state and capabilities.
Of course. My point is that observing if the discount rate changes with the risk tells you if the agent is rational or irrational, not if the discount rate is all instrumental or partially terminal.
Stepping back for a moment, terminal values represent what the agent really wants, and instrumental values are things sought en-route.
The idea I was trying to express was: if what an agent really wants is not temporally discounted, then instrumental temporal discounting will produce a predictable temporal discounting curve—caused by aging, mortality risk, uncertainty, etc.
Deviations from that curve would indicate the presence of terminal temporal discounting.
Agreed.
I have no disagreement at all with your analysis here. This is not fundamental discounting. And if you have decision alternatives which affect the chances of dying, then it doesn’t even work to model it as if it were fundamental.
You recently mentioned the possibility of dying in the interim. There’s also the possibility of aging in the interim. Such factors can affect utility calculations.
For example: I would much rather have my grandmother’s inheritance now than years down the line, when she finally falls over one last time—because I am younger and fitter now.
Significant temporal discounting makes sense sometimes—for example, if there is a substantial chance of extinction per unit time. I do think a lot of discounting is instrumental, though—rather than being a reflection of ultimate values—due to things like the future being expensive to predict and hard to influence.
My brain spends more time thinking about tomorrow than about this time next year—because I am more confident about what is going on tomorrow, and am better placed to influence it by developing cached actions, etc. Next year will be important too—but there will be a day before to allow me to prepare for it closer to the time, when I am better placed to do so. The difference is not because I will be older then—or because I might die in the mean time. It is due to instrumental factors.
Of course one reason this is of interest is because we want to know what values to program into a superintelligence. That superintelligence will probably not age—and will stand a relatively low chance of extinction per unit time. I figure its ultimate utility function should have very little temporal discounting.
The problem with wiring discount functions into the agent’s ultimate utility function is that that is what you want it to preserve as it self improves. Much discounting is actually due to resource limitation issues. It makes sense for such discounting to be dynamically reduced as more resources become cheaply available. It doesn’t make much sense to wire-in short-sightedness.
I don’t mind tree-pruning algorithms attempting to normalise partial evaluations at different times—so they are more directly comparable to each other. The process should not get too expensive, though—the point of tree pruning is that it is an economy measure.
I suspect you want to replace “feel like they have given” with “give.”
Unless you are actually claiming that what is immoral is to make people fail to feel consulted, rather than to fail to consult them, which doesn’t sound like what you’re saying.
I think I will go with a simple tense change: “feel that they are giving”. Assent is far more important in the lead-up to the Singularity than during the aftermath.
Although I used the language “morally wrong”, my reason for that was mostly to make the rhetorical construction parallel. My preference for an open, inclusive process is a strong preference, but it is really more political/practical than moral/idealistic. One ought to allow the horses to approach the trough of political participation, if only to avoid being trampled, but one is not morally required to teach them how to drink.
Ah, I see. Sure, if you don’t mean morally wrong but rather politically impractical, then I withdraw my suggestion… I entirely misunderstood your point.
No, I did originally say (and mostly mean) “morally” rather than “politically”. And I should thank you for inducing me to climb down from that high horse.
I submit that I have many of the same misconceptions that Eliezer does; he changed his mind about one of the few places I disagree with him. That makes it far more of a change than it would be for you (one out of eight is a small portion, one out of a thousand is an invisible fraction).
Good point. And since ‘scary’ is very much a subjective judgment, that mean that I can’t validly criticize you for being foolish unless I have some way of arguing that yours and Eliezer’s positions in the realm of meta-ethics are misconceptions—something I don’t claim to be able to do.
So, if I wish my criticisms to be objective, I need to modify them. Eliezer’s expressed positions on meta-ethics (particularly his apparent acceptance of act-utilitarianism and his unwillingness to discount future utilities) together with some of his beliefs regarding the future (particularly his belief in the likelihood of a positive singularity and expansion of human population into the universe) make his ethical judgments completely unpredictable to many other people—unpredictable because the judgment may turn on subtle differences in the expect consequences of present day actions on people in the distant future. And, if one considers the moral judgments of another personal to be unpredictable, and that person is powerful, then one ought to consider that person scary. Eliezer is probably scary to many people.
True, but it has little bearing on whether Eliezer should be scary. That is, “Eliezer is scary to many people” is mostly a fact about many people, and mostly not a fact about Eliezer. The reverse of this (and what I base this distinction on) is that some politicians should be scary, and are not scary to many people.
I’m not sure the proposed modification helps: you seem to have expanded your criticism so far, in order to have them lead to the judgment you want to reach, that they cover too much.
I mean, sure, unpredictability is scarier (for a given level of power) than predictability. Agreed, But so what?
For example, my judgments will always be more unpredictable to people much stupider than I am than to people about as smart or smarter than I am. So the smarter I am, the scarier I am (again, given fixed power)… or, rather, the more people I am scary to… as long as I’m not actively devoting effort to alleviating those fears by, for example, publicly conforming to current fashions of thought. Agreed.
But what follows from that? That I should be less smart? That I should conform more? That I actually represent a danger to more people? I can’t see why I should believe any of those things.
You started out talking about what makes one dangerous; you have ended up talking about what makes people scared of one whether one is dangerous or not. They aren’t equivalent.
Well, I hope I haven’t done that.
Well, I certainly did that. I was trying to address the question more objectively, but it seems I failed. Let me try again from a more subjective, personal position.
If you and I share the same consequentialist values, but I know that you are more intelligent, I may well consider you unpredictable, but I won’t consider you dangerous. I will be confident that your judgments, in pursuit of our shared values, will be at least as good as my own. Your actions may surprise me, but I will usually be pleasantly surprised.
If you and I are of the same intelligence, but we have different consequentialist values (both being egoists, with disjoint egos, for example) then we can expect to disagree on many actions. Expecting the disagreement, we can defend ourselves, or even bargain our way to a Nash bargaining solution in which (to the extent that we can enforce our bargain) we can predict each others behavior to be that promoting compromise consequences.
If, in addition to different values, we also have different beliefs, then bargaining is still possible, though we cannot expect to reach a Pareto optimal bargain. But the more our beliefs diverge, regarding consequences that concern us, the less good our bargains can be. In the limit, when the things that matter to us are particularly difficult to predict, and when we each have no idea what the other agent is predicting, bargaining simply becomes ineffective.
Eliezer has expressed his acceptance of the moral significance of the utility functions of people in the far distant future. Since he believes that those people outnumber us folk in the present, that seems to suggest that he would be willing to sacrifice the current utility of us in favor of the future utility of them. (For example, the positive value of saving a starving child today does not outweigh the negative consequences on the multitudes of the future of delaying the Singularity by one day).
I, on the other hand, systematically discount the future. That, by itself, does not make Eliezer dangerous to me. We could strike a Nash bargain, after all. However, we inevitably also have different beliefs about consequences, and the divergence between our beliefs becomes greater the farther into the future we look. And consequences in the distant future are essentially all that matters to people like Eliezer—the present fades into insignificance by contrast. But, to people like me, the present and near future are essentially all that matter—the distant future discounts into insignificance.
So, Eliezer and I care about different things. Eliezer has some ability to predict my actions because he knows I care about short-term consequences and he knows something about how I predict short-term consequences. But I have little ability to predict Eliezer’s actions, because I know he cares primarily about long term consequences, and they are inherently much more unpredictable. I really have very little justification for modeling Eliezer (and any other act utilitarian who refuses to discount the future) as a rational agent.
I wish you would just pretend that they care about things a million times further into the future than you do.
The reason is that there are instrumental reasons to discount—the future disappears into a fog of uncertainty—and you can’t make decisions based on the value of things you can’t forsee.
The instrumental reasons fairly quickly dominate as you look further out—even when you don’t discount in your values. Reading your post, it seems as though you don’t “get” this, or don’t agree with it—or something.
Yes, the far-future is unpredictable—but in decision theory, that tends to make it a uniform grey—not an unpredictable black and white strobing pattern.
I don’t need to pretend. Modulo some mathematical details, it is the simple truth. And I don’t think there is anything irrational about having such preferences. It is just that, since I cannot tell whether or not what I do will make such people happy, I have no motive to pay any attention to their preferences.
Yet, it seems that the people who care about the future do not agree with you on that. Bostrom, Yudkowsky, Nesov, et al. frequently invoke assessments of far-future consequences (sometimes in distant galaxies) in justifying their recommendations.
We have crossed wires here. What I meant is that I wish you would stop protesting about infinite utilities—and how non-discounters are not really even rational agents—and just model them as ordinary agents who discount a lot less than you do.
Objections about infinity strike me as irrelevant and uninteresting.
Is that your true objection? I expect you can figure out what would make these people happy fairly easily enough most of the time—e.g. by asking them.
Indeed. That is partly poetry, though (big numbers make things seem important) - and partly because they think that the far future will be highly contingent on near future events.
The thing they are actually interested in influencing is mostly only a decade or so out. It does seem quite important—significant enough to reach back to us here anyway.
If what you are trying to understand is far enough away to be difficult to predict, and very important, then that might cause some oscillations. That is hardly a common situation, though.
Most of the time, organisms act as though want to become ancestors. To do that, the best thing they can do is focus on having some grandkids. Expanding their circle of care out a few generations usually makes precious little difference to their actions. The far future is unforseen, and usually can’t be directly influenced. It is usually not too relevant. Usually, you leave it to your kids to deal with.
That is a valid point. So, I am justified in treating them as rational agents to the extent that I can engage in trade with them. I just can’t enter into a long-term Nash bargain with them in which we jointly pledge to maximize some linear combination of our two utility functions in an unsupervised fashion. They can’t trust me to do what they want, and I can’t trust them to judge their own utility as bounded.
I think this is back to the point about infinities. The one I wish you would stop bringing up—and instead treat these folk as though they are discounting only a teeny, tiny bit.
Frankly, I generally find it hard to take these utilitarian types seriously in the first place. A “signalling” theory (holier-than-thou) explains the unusually high prevalance of utilitarianism among moral philosophers—and an “exploitation” theory explains its prevalance among those running charitable causes (utilitarianism-says-give-us-your-money). Those explanations do a good job of modelling the facts about utilitarianism—and are normally a lot more credible than the supplied justifications—IMHO.
Which suggests that we are failing to communicate. I am not surprised.
I do that! And I still discover that their utility functions are dominated by huge positive and negative utilities in the distant future, while mine are dominated by modest positive and negative utilities in the near future. They are still wrong even if they fudge it so that their math works.
I went from your “I can’t trust them to judge their own utility as bounded” to your earlier “infinity” point. Possibly I am not trying very hard here, though...
My main issue was you apparently thinking that you couldn’t predict their desires in order to find mutually beneficial trades. I’m not really sure if this business about not being able to agree to maximise some shared function is a big deal for you.
Mm. OK, so you are talking about scaring sufficiently intelligent rationalists, not scaring the general public. Fair enough.
What you say makes sense as far as it goes, assuming some mechanism for reliable judgments about people’s actual bases for their decisions. (For example, believing their self-reports.)
But it seems the question that should concern you is not whether Eliezer bases his decisions on predictable things, but rather whether Eliezer’s decisions are themselves predictable.
Put a different way: by your own account, the actual long-term consequences don’t correlate reliably with Eliezer’s expectations about them… that’s what it means for those consequences to be inherently unpredictable. And his decisions are based on his expectations, of course, not on the actual future consequences. So it seems to follow that once you know Eliezer’s beliefs about the future, whether those beliefs are right or wrong is irrelevant to you: that just affects what actually happens in the future, which you systematically discount anyway.
So if Eliezer is consistent in his beliefs about the future, and his decisions are consistently grounded in those beliefs, I’m not sure what makes him any less predictable to me than you are.
Of course, his expectations might not be consistent. Or they might be consistent but beyond your ability to predict. Or his decisions might be more arbitrary than you suggest here. For that matter, he might be lying outright. I’m not saying you should necessarily trust him, or anyone else.
But those same concerns apply to everybody, whatever their professed value structure. I would say the same things about myself.
But Eliezer’s beliefs about the future continue to change—as he gains new information and completes new deductions. And there is no way that he can practically keep me informed of his beliefs—neither he nor I would be willing to invest the time required for that communication. But Eliezer’s beliefs about the future impact his actions in the present, and those actions have consequences both in the near and distant future. From my point of view, therefore, his actions have essentially random effects on the only thing that matters to me—the near future.
Absolutely. But who isn’t that true of? At least Eliezer has extensively documented his putative beliefs at various points in time, which gives you some data points to extrapolate from.
I have no complaints regarding the amount of information about Eliezer’s beliefs that I have access to. My complaint is that Eliezer, and his fellow non-discounting act utilitarians, are morally driven by the huge differences in utility which they see as arising from events in the distant future—events which I consider morally irrelevant because I discount the future. No realistic amount of information about beliefs can alleviate this problem. The only fix is for them to start discounting. (I would have added “or for me to stop discounting” except that I still don’t know how to handle the infinities.)
Given that they predominantly care about things I don’t care about, and that I predominantly care about things they don’t worry about, we can only consider each other to be moral monsters.
You and I seem to be talking past each other now. It may be time to shut this conversation down.
Ethical egoists are surely used to this situation, though. The world is full of people who care about extremely different things from one another.
Yes. And if they both mostly care about modest-sized predictable things, then they can do some rational bargaining. Trouble arises when one or more of them has exquisitely fragile values—when they believe that switching a donation from one charity to another destroys galaxies.
I expect your decision algorithm will find a way to deal with people who won’t negotiate on some topics—or who behave in manner you have a hard time predicting. Some trouble for you, maybe—but probably not THE END OF THE WORLD.
Looking at the last 10 years, there seems to be some highly-predictable fund raising activity, and a lot of philosophising about the importance of machine morality.
I see some significant patterns there. It is not remotely like a stream of random events. So: what gives?
Sure, the question of whether a superintelligence will construct a superior morality to that which natural selection and cultural evolution have constructed on Earth is in some sense a narrow technical question. (The related question of whether the phrase “superior morality” even means anything is, also.)
But it’s a technical question that pertains pretty directly to the question of whose side one envisions oneself on.
That is, if one answers “yes,” it can make sense to ally with the Singularity rather than humanity (assuming that even means anything) as EY-1998 claims to, and still expect some unspecified good (or perhaps Good) result. Whereas if one answers “no,” or if one rejects the very idea that there’s such a thing as a superior morality, that justification for alliance goes away.
That said, I basically agree with you, though perhaps for different reasons than yours.
That is, even after embracing the idea that no other values, even those held by a superintelligence, can be superior to human values, one is still left with the same choice of alliances. Instead of “side with humanity vs. the Singularity,” the question involves a much narrower subset: “side with humanity vs. FAI-induced Singularity,” but from our perspective it’s a choice among infinities.
Of course, advocates of FAI-induced Singularity will find themselves saying that there is no conflict, really, because an FAI-induced Singularity will express by definition what’s actually important about humanity. (Though, of course, there’s no guarantee that individual humans won’t all be completely horrified by the prospect.)
Recall that after CEV extrapolates current humans’ volitions and construes a coherent superposition, the next step isn’t “do everything that superposition says”, but rather, “ask that superposition the one question ‘Given the world as it is right now, what program should we run next?’, run that program, and then shut down”. I suppose it’s possible that our CEV will produce an AI that immediately does something we find horrifying, but I think our future selves are nicer than that… or could be nicer than that, if extrapolated the right way, so I’d consider it a failure of Friendliness if we get a “do something we’d currently find horrifying for the greater good” AI if a different extrapolation strategy would have resulted in something like a “start with the most agreeable and urgent stuff, and other than that, protect us while we grow up and give us help where we need it” AI.
I really doubt that we’d need an AI to do anything immediately horrifying to the human species in order to allow it to grow up into an awesome fun posthuman civilization, so if CEV 1.0 Beta 1 appeared to be going in that direction, that would probably be considered a bug and fixed.
(shrug) Sure, if you’re right that the “most urgent and agreeable stuff” doesn’t happen to press a significant number of people’s emotional buttons, then it follows that not many people’s emotional buttons will be pressed.
But there’s a big difference between assuming that this will be the case, and considering it a bug if it isn’t.
Either I trust the process we build more than I trust my personal judgments, or I don’t.
If I don’t, then why go through this whole rigamarole in the first place? I should prefer to implement my personal judgments. (Of course, I may not have the power to do so, and prefer to join more powerful coalitions whose judgments are close-enough to mine. But in that case CEV becomes a mere political compromise among the powerful.)
If I do, then it’s not clear to me that “fixing the bug” is a good idea.
That is, OK, suppose we write a seed AI intended to work out humanity’s collective CEV, work out some next-step goals based on that CEV and an understanding of likely consequences, construct a program P to implement those goals, run P, and quit.
Suppose that I am personally horrified by the results of running P. Ought I choose to abort P? Or ought I say to myself “Oh, how interesting: my near-mode emotional reactions to the implications of what humanity really wants are extremely negative. Still, most everybody else seems OK with it. OK, fine: this is not going to be a pleasant transition period for me, but my best guess is still that it will ultimately be for the best.”
Is there some number of people such that if more than that many people are horrified by the results, we ought to choose to abort P?
Does the question even matter? The process as you’ve described it doesn’t include an abort mechanism; whichever choice we make P is executed.
Ought we include such an abort mechanism? It’s not at all clear to me that we should. I can get on a roller-coaster or choose not to get on it, but giving me a brake pedal on a roller coaster is kind of ridiculous.
It’s partly a chance vs necessity question.
It is partly a question about whether technological determinism is widespread.
Apparently he changed his mind about a bunch of things.
On what appears to be their current plan, the SIAI, don’t currently look very dangerous, IMHO.
Eray Ozkural recently complained: “I am also worried that backwards people and extremists will threaten us, and try to dissuade us from accomplishing our work, due to your scare tactics.”
I suppose that sort of thing is possible—but my guess is that they are mostly harmless.
Or so you hope.
Yes, I agree. I don’t really believe that he only learnt how to disguise his true goals. But I’m curious if you would be satisfied with his word alone if he would be able to run a fooming AI next week only if you gave your OK?
He has; this is made abundantly clear in the Metaethics sequence and particularly the “coming of age” sequence. That passage appears to be a reflection of the big embarrassing mistake he talked about, when he thought that he knew nothing about true morality (se “Could Anything Be Right?”) and that a superintelligence with a sufficiently “unconstrained” goal system (or what he’d currently refer to as “a rock”) would necessarily discover the ultimate true morality, so that whatever this superintelligence ended up doing would necessarily be the right thing, whether that turned out to consist of giving everyone a volcano lair full of catgirls/boys or wiping out humanity and reshaping the galaxy for its own purposes.
Needless to say, that is not his view anymore; there isn’t even any “Us or Them” to speak of anymore. Friendly AIs aren’t (necessarily) people, and certainly won’t be a distinct race of people with their own goals and ambitions.
Yes, I’m not suggesting that he is just signaling all that he wrote in the sequences to persuade people to trust him. I’m just saying that when you consider what people are doing for much less than shaping the whole universe to their liking, one might consider some sort of public or third-party examination before anyone is allowed to launch a fooming AI.
The hard part there is determining who’s qualified to perform that examination.
It will probably never come to it anyway. Not because the SIAI is not going to succeed but if it told anyone that it is even close to implementing something like CEV then the whole might of the world would crush it (if the world didn’t turn rational until then). Because to say that you are going to run a fooming AI will be interpreted as trying to take over all power and rule the universe. I suppose this is also the most likely reason for the SIAI to fail. The idea is out and once people notice that fooming AI isn’t just science fiction they will do everything to stop anyone from either implementing one at all or to run their own before anyone else does. And who’ll be the first competitor to take out in the race to take over the universe? The SIAI of course, just search Google. I guess it would have been a better idea to make this a stealth project from day one. But that train has left.
Anyway, if the SIAI does succeed one can only hope that Yudkowsky is not Dr. Evil in disguise. But even that would still be better than a paperclip maximizer. I assign more utility to a universe adjusted to Yudkowsky’s volition (or the SIAI) than paperclips (I suppose even if that means I’ll not “like” what happens to me then).
I don’t see who is going to enforce that. Probably nobody.
What we are fairly likely to see is open-source projects getting more limelight. It is hard to gather mindshare if your strategy is: trust the code to us. Relatively few programmers are likely to buy into such projects—unless you pay them to do so.
Yes on the question of humans vs Singularity.
(His word alone would not be enough to convince me he’s gotten the fooming AI friendly, though, so I would not give the OK for prudential reasons.)
So you take him at his word that he’s working in your best interest. You don’t think it is necessary to supervise the SIAI while working towards friendly AI. But once they finished their work, ready to go, you are in favor of some sort of examination before they can implement it. Is that correct?
I don’t think human selfishness vs. public interest is much of a problem with FAI; everyone’s interests with respect to FAI are well correlated, and making an FAI which specifically favors its creator doesn’t give enough extra benefit over an FAI which treats everyone equally to justify the risks (that the extra term will be discovered, or that the extra term introduces a bug). Not even for a purely selfish creator; FAI scenarios just doesn’t leave enough room for improvement to motivate implementing something else.
On the matter of inspecting AIs before launch, however, I’m conflicted. On one hand, the risk of bugs is very serious, and the only way to mitigate it is to have lots of qualified people look at it closely. On the other hand, if the knowledge that a powerful AI was close to completion became public, it would be subject to meddling by various entities that don’t understand what they’re doing. and it would also become a major target for espionage by groups of questionable motives and sanity who might create UFAIs. These risks are difficult to balance, but I think secrecy is the safer choice, and should be the default.
If your first paragraph turns out to be true, does that change anything with respect to the problem of human and political irrationality? My worry is that even if there is only one rational solution that everyone should favor, how likely is it that people understand and accept this? That might be no problem given the current perception. If the possibility of fooming AI will still be ignored at the point it will be possible to implement friendliness (CEV etc.), then there will be no opposition. So some quick quantum leaps towards AGI will likely allow the SIAI to follow through on it. But my worry is that if the general public or governments notice this possibility and take it serious, it will turn into a political mess never seen before. The world would have to be dramatically different for the big powers to agree on something like CEV. I still think this is the most likely failure mode in case the SIAI succeeds in defining friendliness before someone else runs a fooming AI. Politics.
I agree. But is that still possible? After all we’re writing about it in public. Although to my knowledge the SIAI never suggested that it would actually create a fooming AI, only come up with a way to guarantee its friendliness. But what you said in your second paragraph would suggest that the SIAI would also have to implement friendliness or otherwise people will take advantage of it or simply mess it up.
This?
http://www.acceleratingfuture.com/people-blog/?p=196
Probably it would be easier to run the examination during the SIAI’s work, rather than after. Certainly it would save more lives. So, supervise them, so that your examination is faster and more thorough. I am not in favour of pausing the project, once complete, to examine it if it’s possible to examine in in operation.
At the bottom—just after where he talks about his “transfer of allegiance”—it says:
©1998 by Eliezer S. Yudkowsky.
We can’t say he didn’t warn us ;-)
IMO, it is somewhat reminiscent of certain early Zuckerberg comments.
Eliezer1998 is almost as scary as Hanson2010 - and for similar reasons.
1998 you mean?
Yes. :)
What Zuckerberg comments are you referring to?
The IM ones where he says “trust me”.
Zuckerberg probably thought they were private, though. I added a link.
If you follow the link:
You shouldn’t seek to “weaken an argument”, you should seek what is the actual truth, and then maybe ways of communicating your understanding. (I believe that’s what you intended anyway, but think it’s better not to say it this way, as a protective measure against motivated cognition.)
I like your parenthetical, I often want to say something like this, and you’ve put it well.
Thank-you for pointing this out.
I took wedrifid’s point as being that whether EY is right or not, the bad effect described happens. This is part of the lose-lose nature of the original problem (what to do about a post that hurt people).
I don’t think this rhetoric is applicable. Several very intelligent posters have deemed the idea dangerous; a very intelligent you deems it safe. You argue they are wrong because it is ‘obviously safe’.
Eliezer is perfectly correct to point out that, on the whole of it, ‘obviously it is safe’ just does not seem like strong enough evidence when it’s up against a handful of intelligent posters who appear to have strong convictions.
Pardon? I don’t believe I’ve said any such thing here or elsewhere. I could of course be mistaken—I’ve said a lot of things and don’t recall them all perfectly. But it seems rather unlikely that I did make that claim because it isn’t what I believe.
This leads me to the conclusion that...
… This rhetoric isn’t applicable either. ;)
I should have known I wouldn’t get away with that, eh? I actually don’t know if you oppose the decision because you think the idea is safe, or because you think that censorship is wronger than the idea is dangerous, or whether you even oppose the decision at all and were merely pointing out appeals to authority. If you could fill me on the details, I could re-present the argument as it actually applies.
Thankyou, and yes I can see the point behind what you were actually trying to say. It just important to me that I am not misrepresented (even though you had no malicious intent).
There are obvious (well, at least theoretically deducible based on the kind of reasoning I tend to discuss or that used by harry!mor) reasons why it would be unwise to give a complete explanation of all my reasoning.
I will say that ‘censorship is wronger’ is definitely not the kind of thinking I would use. Indeed, I’ve given examples of things that I would definitely censor. Complete with LOTR satire if I recall. :)