Like what? Why he should believe in exponential growth? When by “exponential” he actually means “fast” and no one at SIAI actually advocates for exponentials, those being a strictly Kurzweilian obsession and not even very dangerous by our standards? When he picks MWI, of all things, to accuse us of overconfidence (not “I didn’t understand that” but “I know something you don’t about how to integrate the evidence on MWI, clearly you folks are overconfident”)? When there’s lots of little things scattered through the post like that (“I’m engaging in pluralistic ignorance based on Charles Stross’s nonreaction”) it doesn’t make me want to plunge into engaging the many different little “substantive” parts, get back more replies along the same line, and recapitulate half of Less Wrong in the process. The first thing I need to know is whether XiXiDu did the reading and the reading failed, or did he not do the reading? If he didn’t do the reading, then my answer is simply, “If you haven’t done enough reading to notice that Stross isn’t in our league, then of course you don’t trust SIAI”. That looks to me like the real issue. For substantive arguments, pick a single point and point out where the existing argument fails on it—don’t throw a huge handful of small “huh?”s at me.
Castles in the air. Your claims are based on long chains of reasoning that you do not write down in a formal style. Is the probability of correctness of each link in that chain of reasoning so close to 1, that their product is also close to 1?
I can think of a couple of ways you could respond:
Yes, you are that confident in your reasoning. In that case you could explain why XiXiDu should be similarly confident, or why it’s not of interest to you whether he is similarly confident.
It’s not a chain of reasoning, it’s a web of reasoning, and robust against certain arguments being off. If that’s the case, then we lay readers might benefit if you would make more specific and relevant references to your writings depending on context, instead of encouraging people to read the whole thing before bringing criticisms.
Most of the long arguments are concerned with refuting fallacies and defeating counterarguments, which flawed reasoning will always be able to supply in infinite quantity. The key predictions, when you look at them, generally turn out to be antipredictions, and the long arguments just defeat the flawed priors that concentrate probability into anthropomorphic areas. The positive arguments are simple, only defeating complicated counterarguments is complicated.
“Fast AI” is simply “Most possible artificial minds are unlikely to run at human speed, the slow ones that never speed up will drop out of consideration, and the fast ones are what we’re worried about.”
“UnFriendly AI” is simply “Most possible artificial minds are unFriendly, most intuitive methods you can think of for constructing one run into flaws in your intuitions and fail.”
MWI is simply “Schrodinger’s equation is the simplest fit to the evidence”; there are people who think that you should do something with this equation other than taking it at face value, like arguing that gravity can’t be real and so needs to be interpreted differently, and the long arguments are just there to defeat them.
The only argument I can think of that actually approaches complication is about recursive self-improvement, and even there you can say “we’ve got a complex web of recursive effects and they’re unlikely to turn out exactly exponential with a human-sized exponent”, the long arguments being devoted mainly to defeating the likes of Robin Hanson’s argument for why it should be exponential with an exponent that smoothly couples to the global economy.
One problem I have with your argument here is that you appear to be saying that if XiXiDu doesn’t agree with you, he must be stupid (the stuff about low g etc.). Do you think Robin Hanson is stupid too, since he wasn’t convinced?
I haven’t found the text during a two minute search or so, but I think I remember Robin assigning a substantial probability, say, 30% or so, to the possibility that MWI is false, even if he thinks most likely (i.e. the remaining 70%) that it’s true.
Much as you argued in the post about Einstein’s arrogance, there seems to be a small enough difference between a 30% chance of being false, and a 90% chance of being false, if the latter would imply that Robin was stupid, the former would imply it too.
Right: in fact he would act as though MWI is certainly false… or at least as though Quantum Immortality is certainly false, which has a good chance of being true given MWI.
Quantum Immortality is certainly false, which has a good chance of being true given MWI.
No! He will act as if Quantum Immortality is a bad choice, which is true even if QI works exactly as described. ‘True’ isn’t the right kind word to use unless you include a normative conclusion in the description of QI.
Suppose that being shot with the gun cannot possibly have intermediate results: either the gun fails, or he is killed instantly and painlessly.
Also suppose that given that there are possible worlds where he exists, each copy of him only cares about its anticipated experiences, not about the other copies, and that this is morally the right thing to do… in other words, if he expects to continue to exist, he doesn’t care about other copies that cease to exist. This is certainly the attitude some people would have, and we could suppose (for the LCPW) that it is the correct attitude.
Even so, given these two suppositions, I suspect it would not affect his behavior in the slightest, showing that he would be acting as though QI is certainly false, and therefore as though there is a good chance that MWI is false.
each copy of him only cares about its anticipated experiences, not about the other copies, and that this is morally the right thing to do… in other words, if he expects to continue to exist, he doesn’t care about other copies that cease to exist.
But that is crazy and false, and uses ‘copies’ to in a misleading way. Why would I assume that?
Even so, given these two suppositions, I suspect it would not affect his behavior in the slightest, showing that he would be acting as though QI is certainly false,
This ‘least convenient possible world’ is one in which Robin’s values are changed according to your prescription but his behaviour is not, ensuring that your conclusion is true. That isn’t the purpose of inconvenient worlds (kind of the opposite...)
and therefore as though there is a good chance that MWI is false.
Not at all. You are conflating “MWI is false” with a whole different set of propositions. MWI != QS.
Many people in fact have those values and opinions, and nonetheless act in the way I mention (and there is no one who does not so act) so it is quite reasonable to suppose that even if Robin’s values were so changed, his behavior would remain unchanged.
The very reason Robin was brought up (by you I might add) was to serve as an ad absurdum with respect to intellectual disrespect.
One problem I have with your argument here is that you appear to be saying that if XiXiDu doesn’t agree with you, he must be stupid (the stuff about low g etc.). Do you think Robin Hanson is stupid too, since he wasn’t convinced?
In the Convenient World where Robin is, in fact, too stupid to correctly tackle the concept of QS, understand the difference between MWI and QI or form a sophisticated understanding of his moral intuitions with respect to quantum uncertainty this Counterfactual-Stupid-Robin is a completely useless example.
I can imagine two different meanings for “not convinced about MWI”
It refers to someone who is not convinced that MWI is as good as any other model of reality, and better than most.
It refers to someone who is not convinced that MWI describes the structure of reality.
If we are meant to understand the meaning as #1, then it may well indicate that someone is stupid. Though, more charitably, it might more likely indicate that he is ignorant.
If we are meant to understand the meaning as #2, then I think that it indicates someone who is not entrapped by the Mind Projection Fallacy.
What do you mean by belief in MWI? What sort of experiment could settle whether MWI is true or not?
I suspect that a lot of people object to the stuff including copies of humans and other worlds we should care about and hypotheses about consciousness tacitly build on MWI, rather than MWI itself.
First, the links say that MWI needs a linear quantum theory, and lists therefore the linearity among its predictions. However, linearity is a part of the quantum theory and its mathematical formalism, and nothing specific to MWI. Also, weak non-linearity would be explicable using the language of MWI saying that the different worlds interact a little. I don’t see how testing the superposition principle establishes MWI. A very weak evidence at best.
Second, there is a very confused paragraph about quantum gravity, which, apart from linking to itself, states only that MWI requires gravity to be quantised (without supporting argument) and therefore if gravity is successfully quantised, it forms evidence for MWI. However, nobody doubts that gravity has to be quantised somehow, even hardcore Copenhageners.
The most interesting part is that about the reversible measurement done by an artificial intelligence. As I understand it, it supposes that we construct a machine which could perform measurements in reversed direction of time, for which it has to be immune to quantum decoherence. It sounds interesting, but is also suspicious. I see no way how can we get the information into our brains without decoherence. The argument apparently tries to circumvent this objection by postulating an AI, which is reversible and decoherence-immune, but the AI will still face the same problem when trying to tell us the results. In fact, postulating the need of an AI here seems to be only a tool to make the proposed experiment more obscure and difficult to analyse. We will have a “reversible AI”, therefore miraculously we will detect differences between Copenhagen and MWI.
However, at least there is a link to Deutsch’s article which hopefully explains the experiment in greater detail, so I will read it and edit the comment later.
“Many-worlds is often referred to as a theory, rather than just an interpretation, by those who propose that many-worlds can make testable predictions (such as David Deutsch) or is falsifiable (such as Everett) or by those who propose that all the other, non-MW interpretations, are inconsistent, illogical or unscientific in their handling of measurements”
None of the tests in that FAQ look to me like they could distinguish MWI from MWI+worldeater. The closest thing to an experimental test I’ve come up with is the following:
Flip a quantum coin. If heads, copy yourself once, advance both copies enough to observe the result, then kill one of the copies. If tails, do nothing.
In a many-worlds interpretation of QM, from the perspective of the experimenter, the coin will be heads with probability 2⁄3, since there are two observers in that case and only one if the coin was tails. In the single-world case, the coin will be heads with probability 1⁄2. So each time you repeat the experiment, you get 0.4 bits of evidence for or against MWI. Unfortunately, this evidence is also non-transferrable; someone else can’t use your observation as evidence the same way you can. And getting enough evidence for a firm conclusion involves a very high chance of subjective death (though it is guaranteed that exactly one copy will be left behind). And various quantum immortality hypotheses screw up the experiment, too.
So it is testable in principle, but the experiment involved more odious than one would imagine possible.
The math works the same in all interpretations, but some experiments are difficult to understand intuitively without the MWI. I usually give people the example of the Elitzur-Vaidman bomb tester where the easy MWI explanation says “we know the bomb works because it exploded in another world”, but other interpretations must resort to clever intellectual gymnastics.
If all interpretations are equivalent with respect to testable outcomes, what makes the belief in any particular interpretation so important? Ease of intuitive understanding is a dangerous criterion to rely on, and a relative thing too. Some people are more ready to accept mental gymnastic than existence of another worlds.
Well, that depends. Have you actually tried to do the mental gymnastics and explain the linked experiment using the Copenhagen interpretation? I suspect that going through with that may influence your final opinion.
Have you actually tried to do the mental gymnastics and explain the linked experiment [the Elitzur-Vaidman bomb tester] using the Copenhagen interpretation?
Maybe I’m missing something, but how exactly does this experiment challenge the Copenhagen interpretation more than the standard double-slit stuff? Copenhagen treats “measurement” as a fundamental and irreducible process and measurement devices as special components in each experiment—and in this case it simply says that a dud bomb doesn’t represent a measurement device, whereas a functioning one does, so that they interact with the photon wavefunction differently. The former leaves it unchanged, while the latter collapses it to one arm of the interferometer—eiher its own, in which case it explodes, or the other one, in which case it reveals itself as a measurement device just by the act of collapsing.
As far as I understand, this would be similar to the standard variations on the double-slit experiment where one destroys the interference pattern by placing a particle detector at the exit from one of the holes. One could presumably do a similar experiment with a detector that might be faulty, and conclude that an interference-destroying detector works even if it doesn’t flash when several particles are let through (in cases where they all happen to go through the other hole). Unless I’m misunderstanding something, this would be a close equivalent of the bomb test.
The final conclusion in the bomb test is surely more spectacular, but I don’t see how it produces any extra confusion for Copenhageners compared to the most basic QM experiments.
Frankly, I don’t know what you consider an explanation here. I am quite comfortable with the prediction which the theory gives, and accept that as an explanation. So I never needed mental gymnastics here. The experiment is weird, but it doesn’t seem to me less weird by saying that the information about the bomb’s functionality came from its explosion in the other world.
Your claims are only anti-predictions relative to science-fiction notions of robots as metal men.
Most possible artificial minds are neither Friendly nor unFriendly (unless you adopt such a stringent definition of mind that artificial minds are not going to exist in my lifetime or yours).
Fast AI (along with most of the other wild claims about what future technology will do, really) falls afoul of the general version of Amdahl’s law. (On which topic, did you ever update your world model when you found out you were mistaken about the role of computers in chip design?)
About MWI, I agree with you completely, though I am more hesitant to berate early quantum physicists for not having found it obvious. For a possible analogy: what do you think of my resolution of the Anthropic Trilemma?
This is quite helpful, and suggests that what I wanted is not a lay-reader summary, but an executive summary.
I brought this up elsewhere in this thread, but the fact that quantum mechanics and gravity are not reconciled suggests that even Schrodinger’s equation does not fit the evidence. The “low-energy” disclaimer one has to add is very weird, maybe weirder than any counterintuitive consequences of quantum mechanics.
I brought this up elsewhere in this thread, but the fact that quantum mechanics and gravity are not reconciled suggests to be that even Schrodinger’s equation does not fit the evidence. The “low-energy” disclaimer one has to add is very weird, maybe weirder than any counterintuitive consequences of quantum mechanics.
It’s not the Schrödinger equation alone that gives rise to decoherence and thus many-worlds. (Read Good and Real for another toy model, the “quantish” system.) The EPR experiment and Bell’s inequality can be made to work on macroscopic scales, so we know that whatever mathematical object the universe will turn out to be, it’s not going to go un-quantum on us again: it has the same relevant behavior as the Schrödinger equation, and accordingly MWI will be the best interpretation there as well.
“There is no intangible stuff of goodness that you can divorce from life and love and happiness in order to ask why things like that are good. They are simply what you are talking about in the first place when you talk about goodness.”
And then the long arguments are about why your brain makes you think anything different.
This is less startling than your more scientific pronouncements. Are there any atheists reading this that find this (or at first found this) very counterintuitive or objectionable?
I would go further, and had the impression from somewhere that you did not go that far. Is that accurate?
I’m a cognitivist. Sentences about goodness have truth values after you translate them into being about life and happiness etc. As a general strategy, I make the queerness go away, rather than taking the queerness as a property of a thing and using it to deduce that thing does not exist; it’s a confusion to resolve, not an existence to argue over.
No, nothing, and because while religion does contain some confusion, after you eliminate the confusion you are left with claims that are coherent but false.
Morality is a specific set of values (Or, more precisely, a specific algorithm/dynamic for judging values). Humans happen to be (for various reasons) the sort of beings that value morality as opposed to valuing, say, maximizing paperclip production. It is indeed objectively better (by which we really mean “more moral”/”the sort of thing we should do”) to be moral than to be paperclipish. And indeed we should be moral, where by “should” we mean, “more moral”.
(And moral, when we actually cash out what we actually mean by it seems to translate to a complicated blob of values like happiness, love, creativity, novelty, self determination, fairness, life (as in protecting theirof), etc...)
It may appear that paperclip beings and moral beings disagree about something, but not really. The paperclippers would, once they’ve analyzed what humans actually mean by “moral”, would agree “yep, humans are more moral than us. But who cares about this morality stuff, it doesn’t maximize paperclips!”
Of course, screw the desires of the paperclippers, after all, they’re not actually moral. We really are objectively better (once we think carefully by what we mean by “better”) than them.
(note, “does something or does something not actually do a good job of fulfilling a certain value?” is an objective question. ie, “does a particular action tend to increase the expected number of paperclips?” (on the paperclipper side) or, on our side, stuff like “does a particular action tend to save more lives, increase happiness, increase fairness, add novelty...” etc etc etc is an objective question in that we can extract specific meaning from that question and can objectively (in a way the paperclippers would agree with) judge that. It simply happens to be that we’re the sorts of beings that actually care about the answer to that (as we should be), while the screwy hypothetical paperclippers are immoral and only care about paperclips.
How’s that, that make sense? Or, to summarize the summary, “Morality is objective, and we humans happen to be the sorts of beings that value morality, as opposed to valuing something else instead”
a specific algorithm/dynamic for judging values, or
a complicated blob of values like happiness, love, creativity, novelty, self determination, fairness, life (as in protecting theirof), etc.?
If it’s 1, can we say something interesting and non-trivial about the algorithm, besides the fact that it’s an algorithm? In other words, everything can be viewed as an algorithm, but what’s the point of viewing morality as an algorithm?
If it’s 2, why do we think that two people on opposite sides of the Earth are referring to the same complicated blob of values when they say “morality”? I know the argument about the psychological unity of humankind (not enough time for significant genetic divergence), but what about cultural/memetic evolution?
I’m guessing the answer to my first question is something like, morality is an algorithm whose current “state” is a complicated blob of values like happiness, love, … so both of my other questions ought to apply.
If it’s 2, why do we think that two people on opposite sides of the Earth are referring to the same complicated blob of values when they say “morality”? I know the argument about the psychological unity of humankind (not enough time for significant genetic divergence), but what about cultural/memetic evolution?
You don’t even have to do any cross-cultural comparisons to make such an argument. Considering the insights from modern behavioral genetics, individual differences within any single culture will suffice.
There is no reason to be at all tentative about this. There’s tons of cog sci data about what people mean when they talk about morality. It varies hugely (but predictably) across cultures.
Why are you using algorithm/dynamic here instead of function or partial function? (On what space, I will ignore that issue, just as you have...) Is it supposed to be stateful? I’m not even clear what that would mean. Or is function what you mean by #2? I’m not even really clear on how these differ.
You might have gotten confused because I quoted Psy-Kosh’s phrase “specific algorithm/dynamic for judging values” whereas Eliezer’s original idea I think was more like an algorithm for changing one’s values in response to moral arguments. Here are Eliezer’s own words:
I would say, by the way, that the huge blob of a computation is not just my present terminal values (which I don’t really have - I am not a consistent expected utility maximizers); the huge blob of a computation includes the specification of those moral arguments, those justifications, that would sway me if I heard them.
Others have pointed out that this definition is actually quite unlikely to be coherent: people would be likely to be ultimately persuaded by different moral arguments and justifications if they had different experiences and heard arguments in different orders etc.
Others have pointed out that this definition is actually quite unlikely to be coherent
Yes, see here for an argument to that effect by Marcello and subsequent discussion about it between Eliezer and myself.
I think the metaethics sequence is probably the weakest of Eliezer’s sequences on LW. I wonder if he agrees with that, and if so, what he plans to do about this subject for his rationality book.
I think the metaethics sequence is probably the weakest of Eliezer’s sequences on LW. I wonder if he agrees with that, and if so, what he plans to do about this subject for his rationality book.
This is somewhat of a concern given Eliezer’s interest in Friendliness!
As far as I can understand, Eliezer has promoted two separate ideas about ethics: defining personal morality as a computation in the person’s brain rather than something mysterious and external, and extrapolating that computation into smarter creatures. The former idea is self-evident, but the latter (and, by extension, CEV) has received a number of very serious blows recently. IMO it’s time to go back to the drawing board. We must find some attack on the problem of preference, latch onto some small corner, that will allow us to make precise statements. Then build from there.
defining personal morality as a computation in the person’s brain rather than something mysterious and external
But I don’t see how that, by itself, is a significant advance. Suppose I tell you, “mathematics is a computation in a person’s brain rather than something mysterious and external”, or “philosophy is a computation in a person’s brain rather than something mysterious and external”, or “decision making is a computation in a person’s brain rather than something mysterious and external” how much have I actually told you about the nature of math, or philosophy, or decision making?
This makes sense in that it is coherent, but it is not obvious to me what arguments would be marshaled in its favor. (Yudkowsky’s short formulations do point in the direction of their justifications.) Moreover, the very first line, “morality is a specific set of values,” and even its parenthetical expansion (algorithm for judging values), seems utterly preposterous to me. The controversies between human beings about which specific sets of values are moral, at every scale large and small, are legendary beyond cliche.
The controversies between human beings about which specific sets of values are moral, at every scale large and small, are legendary beyond cliche.
It is a common thesis here that most humans would ultimately have the same moral judgments if they were in full agreement about all factual questions and were better at reasoning. In other words, human brains have a common moral architecture, and disagreements are at the level of instrumental, rather than terminal, values and result from mistaken factual beliefs and reasoning errors.
You may or may not find that convincing (you’ll get to the arguments regarding that if you’re reading the sequences), but assuming that is true, then “morality is a specific set of values” is correct, though vague: more precisely, it is a very complicated set of terminal values, which, in this world, happens to be embedded solely in a species of minds who are not naturally very good at rationality, leading to massive disagreement about instrumental values (though most people do not notice that it’s about instrumental values).
It is a common thesis here that most humans would ultimately have the same moral judgments if they were in full agreement about all factual questions and were better at reasoning. In other words, human brains have a common moral architecture, and disagreements are at the level of instrumental, rather than terminal, values and result from mistaken factual beliefs and reasoning errors.
It is? That’s a worry. Consider this a +1 for “That thesis is totally false and only serves signalling purposes!”
I… think it is. Maybe I’ve gotten something terribly wrong, but I got the impression that this is one of the points of the complexity of value and metaethics sequences, and I seem to recall that it’s the basis for expecting humanity’s extrapolated volition to actually cohere.
I seem to recall that it’s the basis for expecting humanity’s extrapolated volition to actually cohere.
This whole area isn’t covered all that well (as Wei noted). I assumed that CEV would rely on solving an implicit cooperation problem between conflicting moral systems. It doesn’t appear at all unlikely to me that some people are intrinsically selfish to some degree and their extrapolated volitions would be quite different.
Note that I’m not denying that some people present (or usually just assume) the thesis you present. I’m just glad that there are usually others who argue against it!
It is a common thesis here that most humans would ultimately have the same moral judgments if they were in full agreement about all factual questions and were better at reasoning.
Maybe it’s true if you also specify “if they were fully capable of modifying their own moral intuitions.” I have an intuition (an unexamined belief? a hope? a sci-fi trope?) that humanity as a whole will continue to evolve morally and roughly converge on a morality that resembles current first-world liberal values more than, say, Old Testament values. That is, it would converge, in the limit of global prosperity and peace and dialogue, and assuming no singularity occurs and the average lifespan stays constant. You can call this naive if you want to; I don’t know whether it’s true. It’s what I imagine Eliezer means when he talks about “humanity growing up together”.
This growing-up process currently involves raising children, which can be viewed as a crude way of rewriting your personality from scratch, and excising vestiges of values you no longer endorse. It’s been an integral part of every culture’s moral evolution, and something like it needs to be part of CEV if it’s going to actually converge.
It is a common thesis here that most humans would ultimately have the same moral judgments if they were in full agreement about all factual questions and were better at reasoning.
That’s not plausible. That would be some sort of objective morality, and there is no such thing. Humans have brains, and brains are complicated. You can’t have them imply exactly the same preference.
Now, the non-crazy version of what you suggest is that preferences of most people are roughly similar, that they won’t differ substantially in major aspects. But when you focus on detail, everyone is bound to want their own thing.
It makes sense in its own terms, but it leaves the unpleasant implication that morality differs greatly between humans, at both individual and group level—and if this leads to a conflict, asking who is right is meaningless (except insofar as everyone can reach an answer that’s valid only for himself, in terms of his own morality).
So if I live in the same society with people whose morality differs from mine, and the good-fences-make-good-neighbors solution is not an option, as it often isn’t, then who gets to decide whose morality gets imposed on the other side? As far as I see, the position espoused in the above comment leaves no other answer than “might is right.” (Where “might” also includes more subtle ways of exercising power than sheer physical coercion, of course.)
...and if this leads to a conflict, asking who is right is meaningless (except insofar as everyone can reach an answer that’s valid only for himself, in terms of his own morality).
So if I live in the same society with people whose morality differs from mine, and the good-fences-make-good-neighbors solution is not an option, as it often isn’t, then who gets to decide whose morality gets imposed on the other side?
That two people mean different things by the same word doesn’t make all questions asked using that word meaningless, or even hard to answer.
If by “castle” you mean “a fortified structure”, while I mean “a fortified structure surrounded by a moat”, who will be right if we’re asked if the Chateau de Gisors is a castle? Any confusion here is purely semantic in nature. If you answer yes and I answer no, we won’t have given two answers to the same question, we’ll have given two answers to two different questions. If Psy-Kosh says that the Chateau de Gisors is a fortified structure but it is not surrounded by a moat, he’ll have answered both our questions.
Now, once this has been clarified, what would it mean to ask who gets to decide whose definition of ‘castle’ gets imposed on the other side? Do we need a kind of meta-definition of castle to somehow figure out what the one true definition is? If I could settle this issue by exercising power over you, would it change the fact that the Chateau de Gisors is not surrounded by a moat? If I killed everyone who doesn’t mean the same thing by the word ‘castle’ than I do, would the sentence “a fortified structure” become logically equivalent to the sentence “a fortified structure surrounded by a moat”?
In short, substituting the meaning of a word for the word tends to make lots of seemingly difficult problems become laughably easy to solve. Try it.
*blinks* how did I imply that morality varies? I thought (was trying to imply) that morality is an absolute standard and that humans simply happen to be the sort of beings that care about the particular standard we call “morality”. (Well, with various caveats like not being sufficiently reflective to be able to fully explicitly state our “morality algorithm”, nor do we fully know all its consequences)
However, when humans and paperclippers interact, well, there will probably be some sort of fight if one doesn’t end up with some sort PD cooperation or whatever. It’s not that paperclippers and humans disagree on anything, it’s simply, well, they value paperclips a whole lot more than lives. We’re sort of stuck with having to act in a way to prevent the hypothetical them from acting on that.
(of course, the notion that most humans seem to have the same underlying core “morality algorithm”, just disagreeing on the implications or such, is something to discuss, but that gets us out of executive summary territory, no?)
(of course, the notion that most humans seem to have the same underlying core “morality algorithm”, just disagreeing on the implications or such, is something to discuss, but that gets us out of executive summary territory, no?)
I would say that it’s a crucial assumption, which should be emphasized clearly even in the briefest summary of this viewpoint. It is certainly not obvious, to say the least. (And, for full disclosure, I don’t believe that it’s a sufficiently close approximation of reality to avoid the problem I emphasized above.)
Hrm, fair enough. I thought I’d effectively implied it, but apparently not sufficiently.
(Incidentally… you don’t think it’s a close approximation to reality? Most humans seem to value (to various extents) happiness, love, (at least some) lives, etc… right?)
Different people (and cultures) seem to put very different weights on these things.
Here’s an example:
You’re a government minister who has to decide who to hire to do a specific task. There are two applicants. One is your brother, who is marginally competent at the task. The other is a stranger with better qualifications who will probably be much better at the task.
The answer is “obvious.”
In some places, “obviously” you hire your brother. What kind of heartless bastard won’t help out his own brother by giving him a job?
In others, “obviously” you should hire the stranger. What kind of corrupt scoundrel abuses his position by hiring his good-for-nothing brother instead of the obviously superior candidate?
Okay, I can see how XiXiDu’s post might come across that way. I think I can clarify what I think that XiXiDu is trying to get at by asking some better questions of my own.
What evidence has SIAI presented that the Singularity is near?
If the Singularity is near then why has the scientific community missed this fact?
What evidence has SIAI presented for the existence of grey goo technology?
If grey goo technology is feasible then why has the scientific community missed this fact?
Assuming that the Singularity is near, what evidence is there that SIAI has a chance to lower global catastrophic risk in a nontrivial way?
What evidence is there that SIAI has room for more funding?
“Near”? Where’d we say that? What’s “near”? XiXiDu thinks we’re Kurzweil?
What kind of evidence would you want aside from a demonstrated Singularity?
Grey goo? Huh? What’s that got to do with us? Read Nanosystems by Eric Drexler or Freitas on “global ecophagy”. XiXiDu thinks we’re Foresight?
If this business about “evidence” isn’t a demand for particular proof, then what are you looking for besides not-further-confirmed straight-line extrapolations from inductive generalizations supported by evidence?
“Near”? Where’d we say that? What’s “near”? XiXiDu thinks we’re Kurzweil?
You’ve claimed that in your blogging heads divlog with Scott Aaronson that you think that it’s pretty obvious that there will be an AGI within the next century. As far as I know you have not offered a detailed description of the reasoning that led you to this conclusion that can be checked by others.
I see this as significant for the reasons given in my comment here.
Grey goo? Huh? What’s that got to do with us? Read Nanosystems by Eric Drexler or Freitas on “global ecophagy”. XiXiDu thinks we’re Foresight?
I don’t know what the situation is with SIAI’s position on grey goo—I’ve heard people say the SIAI staff believe in nanotechnology having capabilities out of line with the beliefs of the scientific community, but they may have been misinformed. So let’s forget about about questions 3 and 4.
You’ve claimed that in your blogging heads divlog with Scott Aaronson that you think that it’s pretty obvious that there will be an AGI within the next century.
You’ve shifted the question from “is SIAI on balance worth donating to” to “should I believe everything Eliezer has ever said”.
I don’t know what the situation is with SIAI’s position on grey goo—I’ve heard people say the SIAI staff believe in nanotechnology having capabilities out of line with the beliefs of the scientific community, but they may have been misinformed.
The point is that grey goo is not relevant to SIAI’s mission (apart from being yet another background existential risk that FAI can dissolve). “Scientific community” doesn’t normally professionally study (far) future technological capabilities.
My whole point about grey goo has been, as stated, that a possible superhuman AI could use it to do really bad things. That is, I do not see how an encapsulated AI, even a superhuman AI, could pose the stated risks without the use of advanced nanotechnology. Is it going to use nukes, like Skynet? Another question related to the SIAI, regarding advanced nanotechnology, is that if without advanced nanotechnology superhuman AI is at all possible.
I’m shocked how you people misintepreted my intentions there.
Grey goo is only a potential danger in its own right because it’s a way dumb machinery can grow in destructive power (you don’t need to assume AI controlling it for it to be dangerous, at least so goes the story). AGI is not dumb, so it can use something more fitting to precise control than grey goo (and correspondingly more destructive and feasible).
The grey goo example was named to exemplify the speed and sophistication of nanotechnology that would have to be around to either allow an AI to be build in the first place or be of considerable danger.
I consider your comment an expression of personal disgust. No way you could possible misinterpret my original point and subsequent explanation to this extent.
The grey goo example was named to exemplify the speed and sophistication of nanotechnology that would have to be around to either allow an AI to be build in the first place or be of considerable danger.
As katydee pointed out, if for some strange reason grey goo is what AI would want, AI will invent grey goo. If you used “grey goo” to refer to the rough level of technological development necessary to produce grey goo, then my comments missed that point.
I consider your comment an expression of personal disgust. No way you could possible misinterpret my original point and subsequent explanation to this extent.
Illusion of transparency. Since the general point about nanotech seems equally wrong to me, I couldn’t distinguish between the error of making it and making a similarly wrong point about the relevance of grey goo in particular. In general, I don’t plot, so take my words literally. If I don’t like something, I just say so, or keep silent.
If it seems equally wrong, why haven’t you pointed me to some further reasoning on the topic regarding the feasibility of AGI without advanced (grey goo level) nanotechnology? Why haven’t you argued about the dangers of AGI which is unable to make use of advanced nanotechnology? I was inquiring about these issues in my original post and not trying to argue against the scenarios in question.
Yes, I’ve seen the comment regarding the possible invention of advanced nanotechnology by AGI. If AGI needs something that isn’t there it will just pull it out of its hat. Well, I have my doubts that even a superhuman AGI can steer the development of advanced nanotechnology so that it can gain control of it. Sure, it might solve the problems associated with it and send the solutions to some researcher. Then it could buy the stocks of the subsequent company involved with the new technology and somehow gain control...well, at this point we are already deep into subsequent reasoning about something shaky that at the same time is used as evidence of the very reasoning involving it.
To the point: if AGI can’t pose a danger, because its hands are tied, that’s wonderful! Then we have more time to work of FAI. FAI is not about superpowerful robots, it’s about technically understanding what we want, and using that understanding to automate the manufacturing of goodness. The power is expected to come from unbounded automatic goal-directed behavior, something that happens without humans in the system to ever stop the process if it goes wrong.
Overall I’d feel a lot more comfortable if you just said “there’s a huge amount of uncertainty as to when existential risks will strike and which ones will strike, I don’t know whether or not I’m on the right track in focusing on Friendly AI or whether I’m right about when the Singularity will occur, I’m just doing the best that I can.”
This is largely because of the issue that I raise here
I should emphasize that I don’t think that you’d ever knowingly do something that raised existential risk, I think that you’re a kind and noble spirit. But I do think I’m raising a serious issue which you’ve missed.
If this business about “evidence” isn’t a demand for particular proof, then what are you looking for besides not-further-confirmed straight-line extrapolations from inductive generalizations supported by evidence?
I am looking for the evidence in “supported by evidence”. I am further trying to figure how you anticipate your beliefs to pay rent, what you anticipate to see if explosive recursive self-improvement is possible, and how that belief could be surprised by data.
If you just say, “I predict we will likely be wiped out by badly done AI.”, how do you expect to update on evidence? What would constitute such evidence?
To put my own spin on XiXiDu’s questions: What quality or position does Charles Stross possess that should cause us to leave him out of this conversation (other than the quality ‘Eliezer doesn’t think he should be mentioned’)?
Like what? Why he should believe in exponential growth? When by “exponential” he actually means “fast” and no one at SIAI actually advocates for exponentials, those being a strictly Kurzweilian obsession and not even very dangerous by our standards? When he picks MWI, of all things, to accuse us of overconfidence (not “I didn’t understand that” but “I know something you don’t about how to integrate the evidence on MWI, clearly you folks are overconfident”)? When there’s lots of little things scattered through the post like that (“I’m engaging in pluralistic ignorance based on Charles Stross’s nonreaction”) it doesn’t make me want to plunge into engaging the many different little “substantive” parts, get back more replies along the same line, and recapitulate half of Less Wrong in the process. The first thing I need to know is whether XiXiDu did the reading and the reading failed, or did he not do the reading? If he didn’t do the reading, then my answer is simply, “If you haven’t done enough reading to notice that Stross isn’t in our league, then of course you don’t trust SIAI”. That looks to me like the real issue. For substantive arguments, pick a single point and point out where the existing argument fails on it—don’t throw a huge handful of small “huh?”s at me.
Castles in the air. Your claims are based on long chains of reasoning that you do not write down in a formal style. Is the probability of correctness of each link in that chain of reasoning so close to 1, that their product is also close to 1?
I can think of a couple of ways you could respond:
Yes, you are that confident in your reasoning. In that case you could explain why XiXiDu should be similarly confident, or why it’s not of interest to you whether he is similarly confident.
It’s not a chain of reasoning, it’s a web of reasoning, and robust against certain arguments being off. If that’s the case, then we lay readers might benefit if you would make more specific and relevant references to your writings depending on context, instead of encouraging people to read the whole thing before bringing criticisms.
Most of the long arguments are concerned with refuting fallacies and defeating counterarguments, which flawed reasoning will always be able to supply in infinite quantity. The key predictions, when you look at them, generally turn out to be antipredictions, and the long arguments just defeat the flawed priors that concentrate probability into anthropomorphic areas. The positive arguments are simple, only defeating complicated counterarguments is complicated.
“Fast AI” is simply “Most possible artificial minds are unlikely to run at human speed, the slow ones that never speed up will drop out of consideration, and the fast ones are what we’re worried about.”
“UnFriendly AI” is simply “Most possible artificial minds are unFriendly, most intuitive methods you can think of for constructing one run into flaws in your intuitions and fail.”
MWI is simply “Schrodinger’s equation is the simplest fit to the evidence”; there are people who think that you should do something with this equation other than taking it at face value, like arguing that gravity can’t be real and so needs to be interpreted differently, and the long arguments are just there to defeat them.
The only argument I can think of that actually approaches complication is about recursive self-improvement, and even there you can say “we’ve got a complex web of recursive effects and they’re unlikely to turn out exactly exponential with a human-sized exponent”, the long arguments being devoted mainly to defeating the likes of Robin Hanson’s argument for why it should be exponential with an exponent that smoothly couples to the global economy.
One problem I have with your argument here is that you appear to be saying that if XiXiDu doesn’t agree with you, he must be stupid (the stuff about low g etc.). Do you think Robin Hanson is stupid too, since he wasn’t convinced?
If he wasn’t convinced about MWI it would start to become a serious possibility.
I haven’t found the text during a two minute search or so, but I think I remember Robin assigning a substantial probability, say, 30% or so, to the possibility that MWI is false, even if he thinks most likely (i.e. the remaining 70%) that it’s true.
Much as you argued in the post about Einstein’s arrogance, there seems to be a small enough difference between a 30% chance of being false, and a 90% chance of being false, if the latter would imply that Robin was stupid, the former would imply it too.
I suspect that Robin would not actually act-as-if those odds with a gun to his head, and he is being conveniently modest.
Right: in fact he would act as though MWI is certainly false… or at least as though Quantum Immortality is certainly false, which has a good chance of being true given MWI.
No! He will act as if Quantum Immortality is a bad choice, which is true even if QI works exactly as described. ‘True’ isn’t the right kind word to use unless you include a normative conclusion in the description of QI.
Consider the Least Convenient Possible World...
Suppose that being shot with the gun cannot possibly have intermediate results: either the gun fails, or he is killed instantly and painlessly.
Also suppose that given that there are possible worlds where he exists, each copy of him only cares about its anticipated experiences, not about the other copies, and that this is morally the right thing to do… in other words, if he expects to continue to exist, he doesn’t care about other copies that cease to exist. This is certainly the attitude some people would have, and we could suppose (for the LCPW) that it is the correct attitude.
Even so, given these two suppositions, I suspect it would not affect his behavior in the slightest, showing that he would be acting as though QI is certainly false, and therefore as though there is a good chance that MWI is false.
But that is crazy and false, and uses ‘copies’ to in a misleading way. Why would I assume that?
This ‘least convenient possible world’ is one in which Robin’s values are changed according to your prescription but his behaviour is not, ensuring that your conclusion is true. That isn’t the purpose of inconvenient worlds (kind of the opposite...)
Not at all. You are conflating “MWI is false” with a whole different set of propositions. MWI != QS.
Many people in fact have those values and opinions, and nonetheless act in the way I mention (and there is no one who does not so act) so it is quite reasonable to suppose that even if Robin’s values were so changed, his behavior would remain unchanged.
The very reason Robin was brought up (by you I might add) was to serve as an ad absurdum with respect to intellectual disrespect.
In the Convenient World where Robin is, in fact, too stupid to correctly tackle the concept of QS, understand the difference between MWI and QI or form a sophisticated understanding of his moral intuitions with respect to quantum uncertainty this Counterfactual-Stupid-Robin is a completely useless example.
I can imagine two different meanings for “not convinced about MWI”
It refers to someone who is not convinced that MWI is as good as any other model of reality, and better than most.
It refers to someone who is not convinced that MWI describes the structure of reality.
If we are meant to understand the meaning as #1, then it may well indicate that someone is stupid. Though, more charitably, it might more likely indicate that he is ignorant.
If we are meant to understand the meaning as #2, then I think that it indicates someone who is not entrapped by the Mind Projection Fallacy.
What do you mean by belief in MWI? What sort of experiment could settle whether MWI is true or not?
I suspect that a lot of people object to the stuff including copies of humans and other worlds we should care about and hypotheses about consciousness tacitly build on MWI, rather than MWI itself.
From THE EVERETT FAQ:
“Is many-worlds (just) an interpretation?”
http://www.hedweb.com/manworld.htm#interpretation
“What unique predictions does many-worlds make?”
http://www.hedweb.com/manworld.htm#unique
“Could we detect other Everett-worlds?”
http://www.hedweb.com/manworld.htm#detect
I’m (yet) not convinced.
First, the links say that MWI needs a linear quantum theory, and lists therefore the linearity among its predictions. However, linearity is a part of the quantum theory and its mathematical formalism, and nothing specific to MWI. Also, weak non-linearity would be explicable using the language of MWI saying that the different worlds interact a little. I don’t see how testing the superposition principle establishes MWI. A very weak evidence at best.
Second, there is a very confused paragraph about quantum gravity, which, apart from linking to itself, states only that MWI requires gravity to be quantised (without supporting argument) and therefore if gravity is successfully quantised, it forms evidence for MWI. However, nobody doubts that gravity has to be quantised somehow, even hardcore Copenhageners.
The most interesting part is that about the reversible measurement done by an artificial intelligence. As I understand it, it supposes that we construct a machine which could perform measurements in reversed direction of time, for which it has to be immune to quantum decoherence. It sounds interesting, but is also suspicious. I see no way how can we get the information into our brains without decoherence. The argument apparently tries to circumvent this objection by postulating an AI, which is reversible and decoherence-immune, but the AI will still face the same problem when trying to tell us the results. In fact, postulating the need of an AI here seems to be only a tool to make the proposed experiment more obscure and difficult to analyse. We will have a “reversible AI”, therefore miraculously we will detect differences between Copenhagen and MWI.
However, at least there is a link to Deutsch’s article which hopefully explains the experiment in greater detail, so I will read it and edit the comment later.
“Many-worlds is often referred to as a theory, rather than just an interpretation, by those who propose that many-worlds can make testable predictions (such as David Deutsch) or is falsifiable (such as Everett) or by those who propose that all the other, non-MW interpretations, are inconsistent, illogical or unscientific in their handling of measurements”
http://en.wikipedia.org/wiki/Many-worlds_interpretation
None of the tests in that FAQ look to me like they could distinguish MWI from MWI+worldeater. The closest thing to an experimental test I’ve come up with is the following:
Flip a quantum coin. If heads, copy yourself once, advance both copies enough to observe the result, then kill one of the copies. If tails, do nothing.
In a many-worlds interpretation of QM, from the perspective of the experimenter, the coin will be heads with probability 2⁄3, since there are two observers in that case and only one if the coin was tails. In the single-world case, the coin will be heads with probability 1⁄2. So each time you repeat the experiment, you get 0.4 bits of evidence for or against MWI. Unfortunately, this evidence is also non-transferrable; someone else can’t use your observation as evidence the same way you can. And getting enough evidence for a firm conclusion involves a very high chance of subjective death (though it is guaranteed that exactly one copy will be left behind). And various quantum immortality hypotheses screw up the experiment, too.
So it is testable in principle, but the experiment involved more odious than one would imagine possible.
The math works the same in all interpretations, but some experiments are difficult to understand intuitively without the MWI. I usually give people the example of the Elitzur-Vaidman bomb tester where the easy MWI explanation says “we know the bomb works because it exploded in another world”, but other interpretations must resort to clever intellectual gymnastics.
If all interpretations are equivalent with respect to testable outcomes, what makes the belief in any particular interpretation so important? Ease of intuitive understanding is a dangerous criterion to rely on, and a relative thing too. Some people are more ready to accept mental gymnastic than existence of another worlds.
Well, that depends. Have you actually tried to do the mental gymnastics and explain the linked experiment using the Copenhagen interpretation? I suspect that going through with that may influence your final opinion.
cousin_it:
Maybe I’m missing something, but how exactly does this experiment challenge the Copenhagen interpretation more than the standard double-slit stuff? Copenhagen treats “measurement” as a fundamental and irreducible process and measurement devices as special components in each experiment—and in this case it simply says that a dud bomb doesn’t represent a measurement device, whereas a functioning one does, so that they interact with the photon wavefunction differently. The former leaves it unchanged, while the latter collapses it to one arm of the interferometer—eiher its own, in which case it explodes, or the other one, in which case it reveals itself as a measurement device just by the act of collapsing.
As far as I understand, this would be similar to the standard variations on the double-slit experiment where one destroys the interference pattern by placing a particle detector at the exit from one of the holes. One could presumably do a similar experiment with a detector that might be faulty, and conclude that an interference-destroying detector works even if it doesn’t flash when several particles are let through (in cases where they all happen to go through the other hole). Unless I’m misunderstanding something, this would be a close equivalent of the bomb test.
The final conclusion in the bomb test is surely more spectacular, but I don’t see how it produces any extra confusion for Copenhageners compared to the most basic QM experiments.
Frankly, I don’t know what you consider an explanation here. I am quite comfortable with the prediction which the theory gives, and accept that as an explanation. So I never needed mental gymnastics here. The experiment is weird, but it doesn’t seem to me less weird by saying that the information about the bomb’s functionality came from its explosion in the other world.
Fair enough.
This should be revamped into a document introducing the sequences.
Your claims are only anti-predictions relative to science-fiction notions of robots as metal men.
Most possible artificial minds are neither Friendly nor unFriendly (unless you adopt such a stringent definition of mind that artificial minds are not going to exist in my lifetime or yours).
Fast AI (along with most of the other wild claims about what future technology will do, really) falls afoul of the general version of Amdahl’s law. (On which topic, did you ever update your world model when you found out you were mistaken about the role of computers in chip design?)
About MWI, I agree with you completely, though I am more hesitant to berate early quantum physicists for not having found it obvious. For a possible analogy: what do you think of my resolution of the Anthropic Trilemma?
This is quite helpful, and suggests that what I wanted is not a lay-reader summary, but an executive summary.
I brought this up elsewhere in this thread, but the fact that quantum mechanics and gravity are not reconciled suggests that even Schrodinger’s equation does not fit the evidence. The “low-energy” disclaimer one has to add is very weird, maybe weirder than any counterintuitive consequences of quantum mechanics.
It’s not the Schrödinger equation alone that gives rise to decoherence and thus many-worlds. (Read Good and Real for another toy model, the “quantish” system.) The EPR experiment and Bell’s inequality can be made to work on macroscopic scales, so we know that whatever mathematical object the universe will turn out to be, it’s not going to go un-quantum on us again: it has the same relevant behavior as the Schrödinger equation, and accordingly MWI will be the best interpretation there as well.
Speaking of executive summaries, will you offer one for your metaethics?
“There is no intangible stuff of goodness that you can divorce from life and love and happiness in order to ask why things like that are good. They are simply what you are talking about in the first place when you talk about goodness.”
And then the long arguments are about why your brain makes you think anything different.
This is less startling than your more scientific pronouncements. Are there any atheists reading this that find this (or at first found this) very counterintuitive or objectionable?
I would go further, and had the impression from somewhere that you did not go that far. Is that accurate?
I’m a cognitivist. Sentences about goodness have truth values after you translate them into being about life and happiness etc. As a general strategy, I make the queerness go away, rather than taking the queerness as a property of a thing and using it to deduce that thing does not exist; it’s a confusion to resolve, not an existence to argue over.
To be clear, if sentence X about goodness is translated into sentence Y about life and happiness etc., does sentence Y contain the word “good”?
Edit: What’s left of religion after you make the queerness go away? Why does there seem to be more left of morality?
No, nothing, and because while religion does contain some confusion, after you eliminate the confusion you are left with claims that are coherent but false.
I can do that:
Morality is a specific set of values (Or, more precisely, a specific algorithm/dynamic for judging values). Humans happen to be (for various reasons) the sort of beings that value morality as opposed to valuing, say, maximizing paperclip production. It is indeed objectively better (by which we really mean “more moral”/”the sort of thing we should do”) to be moral than to be paperclipish. And indeed we should be moral, where by “should” we mean, “more moral”.
(And moral, when we actually cash out what we actually mean by it seems to translate to a complicated blob of values like happiness, love, creativity, novelty, self determination, fairness, life (as in protecting theirof), etc...)
It may appear that paperclip beings and moral beings disagree about something, but not really. The paperclippers would, once they’ve analyzed what humans actually mean by “moral”, would agree “yep, humans are more moral than us. But who cares about this morality stuff, it doesn’t maximize paperclips!”
Of course, screw the desires of the paperclippers, after all, they’re not actually moral. We really are objectively better (once we think carefully by what we mean by “better”) than them.
(note, “does something or does something not actually do a good job of fulfilling a certain value?” is an objective question. ie, “does a particular action tend to increase the expected number of paperclips?” (on the paperclipper side) or, on our side, stuff like “does a particular action tend to save more lives, increase happiness, increase fairness, add novelty...” etc etc etc is an objective question in that we can extract specific meaning from that question and can objectively (in a way the paperclippers would agree with) judge that. It simply happens to be that we’re the sorts of beings that actually care about the answer to that (as we should be), while the screwy hypothetical paperclippers are immoral and only care about paperclips.
How’s that, that make sense? Or, to summarize the summary, “Morality is objective, and we humans happen to be the sorts of beings that value morality, as opposed to valuing something else instead”
Is morality actually:
a specific algorithm/dynamic for judging values, or
a complicated blob of values like happiness, love, creativity, novelty, self determination, fairness, life (as in protecting theirof), etc.?
If it’s 1, can we say something interesting and non-trivial about the algorithm, besides the fact that it’s an algorithm? In other words, everything can be viewed as an algorithm, but what’s the point of viewing morality as an algorithm?
If it’s 2, why do we think that two people on opposite sides of the Earth are referring to the same complicated blob of values when they say “morality”? I know the argument about the psychological unity of humankind (not enough time for significant genetic divergence), but what about cultural/memetic evolution?
I’m guessing the answer to my first question is something like, morality is an algorithm whose current “state” is a complicated blob of values like happiness, love, … so both of my other questions ought to apply.
Wei_Dai:
You don’t even have to do any cross-cultural comparisons to make such an argument. Considering the insights from modern behavioral genetics, individual differences within any single culture will suffice.
There is no reason to be at all tentative about this. There’s tons of cog sci data about what people mean when they talk about morality. It varies hugely (but predictably) across cultures.
Why are you using algorithm/dynamic here instead of function or partial function? (On what space, I will ignore that issue, just as you have...) Is it supposed to be stateful? I’m not even clear what that would mean. Or is function what you mean by #2? I’m not even really clear on how these differ.
You might have gotten confused because I quoted Psy-Kosh’s phrase “specific algorithm/dynamic for judging values” whereas Eliezer’s original idea I think was more like an algorithm for changing one’s values in response to moral arguments. Here are Eliezer’s own words:
Others have pointed out that this definition is actually quite unlikely to be coherent: people would be likely to be ultimately persuaded by different moral arguments and justifications if they had different experiences and heard arguments in different orders etc.
Yes, see here for an argument to that effect by Marcello and subsequent discussion about it between Eliezer and myself.
I think the metaethics sequence is probably the weakest of Eliezer’s sequences on LW. I wonder if he agrees with that, and if so, what he plans to do about this subject for his rationality book.
This is somewhat of a concern given Eliezer’s interest in Friendliness!
As far as I can understand, Eliezer has promoted two separate ideas about ethics: defining personal morality as a computation in the person’s brain rather than something mysterious and external, and extrapolating that computation into smarter creatures. The former idea is self-evident, but the latter (and, by extension, CEV) has received a number of very serious blows recently. IMO it’s time to go back to the drawing board. We must find some attack on the problem of preference, latch onto some small corner, that will allow us to make precise statements. Then build from there.
But I don’t see how that, by itself, is a significant advance. Suppose I tell you, “mathematics is a computation in a person’s brain rather than something mysterious and external”, or “philosophy is a computation in a person’s brain rather than something mysterious and external”, or “decision making is a computation in a person’s brain rather than something mysterious and external” how much have I actually told you about the nature of math, or philosophy, or decision making?
The linked discussion is very nice.
This is currently at +1. Is that from Yudkowsky?
(Edit: +2 after I vote it up.)
This makes sense in that it is coherent, but it is not obvious to me what arguments would be marshaled in its favor. (Yudkowsky’s short formulations do point in the direction of their justifications.) Moreover, the very first line, “morality is a specific set of values,” and even its parenthetical expansion (algorithm for judging values), seems utterly preposterous to me. The controversies between human beings about which specific sets of values are moral, at every scale large and small, are legendary beyond cliche.
It is a common thesis here that most humans would ultimately have the same moral judgments if they were in full agreement about all factual questions and were better at reasoning. In other words, human brains have a common moral architecture, and disagreements are at the level of instrumental, rather than terminal, values and result from mistaken factual beliefs and reasoning errors.
You may or may not find that convincing (you’ll get to the arguments regarding that if you’re reading the sequences), but assuming that is true, then “morality is a specific set of values” is correct, though vague: more precisely, it is a very complicated set of terminal values, which, in this world, happens to be embedded solely in a species of minds who are not naturally very good at rationality, leading to massive disagreement about instrumental values (though most people do not notice that it’s about instrumental values).
It is? That’s a worry. Consider this a +1 for “That thesis is totally false and only serves signalling purposes!”
I… think it is. Maybe I’ve gotten something terribly wrong, but I got the impression that this is one of the points of the complexity of value and metaethics sequences, and I seem to recall that it’s the basis for expecting humanity’s extrapolated volition to actually cohere.
This whole area isn’t covered all that well (as Wei noted). I assumed that CEV would rely on solving an implicit cooperation problem between conflicting moral systems. It doesn’t appear at all unlikely to me that some people are intrinsically selfish to some degree and their extrapolated volitions would be quite different.
Note that I’m not denying that some people present (or usually just assume) the thesis you present. I’m just glad that there are usually others who argue against it!
That’s exactly what I took CEV to entail.
Now this is a startling claim.
Be more specific!
Maybe it’s true if you also specify “if they were fully capable of modifying their own moral intuitions.” I have an intuition (an unexamined belief? a hope? a sci-fi trope?) that humanity as a whole will continue to evolve morally and roughly converge on a morality that resembles current first-world liberal values more than, say, Old Testament values. That is, it would converge, in the limit of global prosperity and peace and dialogue, and assuming no singularity occurs and the average lifespan stays constant. You can call this naive if you want to; I don’t know whether it’s true. It’s what I imagine Eliezer means when he talks about “humanity growing up together”.
This growing-up process currently involves raising children, which can be viewed as a crude way of rewriting your personality from scratch, and excising vestiges of values you no longer endorse. It’s been an integral part of every culture’s moral evolution, and something like it needs to be part of CEV if it’s going to actually converge.
That’s not plausible. That would be some sort of objective morality, and there is no such thing. Humans have brains, and brains are complicated. You can’t have them imply exactly the same preference.
Now, the non-crazy version of what you suggest is that preferences of most people are roughly similar, that they won’t differ substantially in major aspects. But when you focus on detail, everyone is bound to want their own thing.
Psy-Kosh:
It makes sense in its own terms, but it leaves the unpleasant implication that morality differs greatly between humans, at both individual and group level—and if this leads to a conflict, asking who is right is meaningless (except insofar as everyone can reach an answer that’s valid only for himself, in terms of his own morality).
So if I live in the same society with people whose morality differs from mine, and the good-fences-make-good-neighbors solution is not an option, as it often isn’t, then who gets to decide whose morality gets imposed on the other side? As far as I see, the position espoused in the above comment leaves no other answer than “might is right.” (Where “might” also includes more subtle ways of exercising power than sheer physical coercion, of course.)
That two people mean different things by the same word doesn’t make all questions asked using that word meaningless, or even hard to answer.
If by “castle” you mean “a fortified structure”, while I mean “a fortified structure surrounded by a moat”, who will be right if we’re asked if the Chateau de Gisors is a castle? Any confusion here is purely semantic in nature. If you answer yes and I answer no, we won’t have given two answers to the same question, we’ll have given two answers to two different questions. If Psy-Kosh says that the Chateau de Gisors is a fortified structure but it is not surrounded by a moat, he’ll have answered both our questions.
Now, once this has been clarified, what would it mean to ask who gets to decide whose definition of ‘castle’ gets imposed on the other side? Do we need a kind of meta-definition of castle to somehow figure out what the one true definition is? If I could settle this issue by exercising power over you, would it change the fact that the Chateau de Gisors is not surrounded by a moat? If I killed everyone who doesn’t mean the same thing by the word ‘castle’ than I do, would the sentence “a fortified structure” become logically equivalent to the sentence “a fortified structure surrounded by a moat”?
In short, substituting the meaning of a word for the word tends to make lots of seemingly difficult problems become laughably easy to solve. Try it.
*blinks* how did I imply that morality varies? I thought (was trying to imply) that morality is an absolute standard and that humans simply happen to be the sort of beings that care about the particular standard we call “morality”. (Well, with various caveats like not being sufficiently reflective to be able to fully explicitly state our “morality algorithm”, nor do we fully know all its consequences)
However, when humans and paperclippers interact, well, there will probably be some sort of fight if one doesn’t end up with some sort PD cooperation or whatever. It’s not that paperclippers and humans disagree on anything, it’s simply, well, they value paperclips a whole lot more than lives. We’re sort of stuck with having to act in a way to prevent the hypothetical them from acting on that.
(of course, the notion that most humans seem to have the same underlying core “morality algorithm”, just disagreeing on the implications or such, is something to discuss, but that gets us out of executive summary territory, no?)
Psy-Kosh:
I would say that it’s a crucial assumption, which should be emphasized clearly even in the briefest summary of this viewpoint. It is certainly not obvious, to say the least. (And, for full disclosure, I don’t believe that it’s a sufficiently close approximation of reality to avoid the problem I emphasized above.)
Hrm, fair enough. I thought I’d effectively implied it, but apparently not sufficiently.
(Incidentally… you don’t think it’s a close approximation to reality? Most humans seem to value (to various extents) happiness, love, (at least some) lives, etc… right?)
Different people (and cultures) seem to put very different weights on these things.
Here’s an example:
You’re a government minister who has to decide who to hire to do a specific task. There are two applicants. One is your brother, who is marginally competent at the task. The other is a stranger with better qualifications who will probably be much better at the task.
The answer is “obvious.”
In some places, “obviously” you hire your brother. What kind of heartless bastard won’t help out his own brother by giving him a job?
In others, “obviously” you should hire the stranger. What kind of corrupt scoundrel abuses his position by hiring his good-for-nothing brother instead of the obviously superior candidate?
Okay, I can see how XiXiDu’s post might come across that way. I think I can clarify what I think that XiXiDu is trying to get at by asking some better questions of my own.
What evidence has SIAI presented that the Singularity is near?
If the Singularity is near then why has the scientific community missed this fact?
What evidence has SIAI presented for the existence of grey goo technology?
If grey goo technology is feasible then why has the scientific community missed this fact?
Assuming that the Singularity is near, what evidence is there that SIAI has a chance to lower global catastrophic risk in a nontrivial way?
What evidence is there that SIAI has room for more funding?
“Near”? Where’d we say that? What’s “near”? XiXiDu thinks we’re Kurzweil?
What kind of evidence would you want aside from a demonstrated Singularity?
Grey goo? Huh? What’s that got to do with us? Read Nanosystems by Eric Drexler or Freitas on “global ecophagy”. XiXiDu thinks we’re Foresight?
If this business about “evidence” isn’t a demand for particular proof, then what are you looking for besides not-further-confirmed straight-line extrapolations from inductive generalizations supported by evidence?
You’ve claimed that in your blogging heads divlog with Scott Aaronson that you think that it’s pretty obvious that there will be an AGI within the next century. As far as I know you have not offered a detailed description of the reasoning that led you to this conclusion that can be checked by others.
I see this as significant for the reasons given in my comment here.
I don’t know what the situation is with SIAI’s position on grey goo—I’ve heard people say the SIAI staff believe in nanotechnology having capabilities out of line with the beliefs of the scientific community, but they may have been misinformed. So let’s forget about about questions 3 and 4.
Questions 1, 2, 5 and 6 remain.
You’ve shifted the question from “is SIAI on balance worth donating to” to “should I believe everything Eliezer has ever said”.
The point is that grey goo is not relevant to SIAI’s mission (apart from being yet another background existential risk that FAI can dissolve). “Scientific community” doesn’t normally professionally study (far) future technological capabilities.
My whole point about grey goo has been, as stated, that a possible superhuman AI could use it to do really bad things. That is, I do not see how an encapsulated AI, even a superhuman AI, could pose the stated risks without the use of advanced nanotechnology. Is it going to use nukes, like Skynet? Another question related to the SIAI, regarding advanced nanotechnology, is that if without advanced nanotechnology superhuman AI is at all possible.
I’m shocked how you people misintepreted my intentions there.
If a superhuman AI is possible without advanced nanotechnology, a superhuman AI could just invent advanced nanotechnology and implement it.
Grey goo is only a potential danger in its own right because it’s a way dumb machinery can grow in destructive power (you don’t need to assume AI controlling it for it to be dangerous, at least so goes the story). AGI is not dumb, so it can use something more fitting to precise control than grey goo (and correspondingly more destructive and feasible).
The grey goo example was named to exemplify the speed and sophistication of nanotechnology that would have to be around to either allow an AI to be build in the first place or be of considerable danger.
I consider your comment an expression of personal disgust. No way you could possible misinterpret my original point and subsequent explanation to this extent.
As katydee pointed out, if for some strange reason grey goo is what AI would want, AI will invent grey goo. If you used “grey goo” to refer to the rough level of technological development necessary to produce grey goo, then my comments missed that point.
Illusion of transparency. Since the general point about nanotech seems equally wrong to me, I couldn’t distinguish between the error of making it and making a similarly wrong point about the relevance of grey goo in particular. In general, I don’t plot, so take my words literally. If I don’t like something, I just say so, or keep silent.
If it seems equally wrong, why haven’t you pointed me to some further reasoning on the topic regarding the feasibility of AGI without advanced (grey goo level) nanotechnology? Why haven’t you argued about the dangers of AGI which is unable to make use of advanced nanotechnology? I was inquiring about these issues in my original post and not trying to argue against the scenarios in question.
Yes, I’ve seen the comment regarding the possible invention of advanced nanotechnology by AGI. If AGI needs something that isn’t there it will just pull it out of its hat. Well, I have my doubts that even a superhuman AGI can steer the development of advanced nanotechnology so that it can gain control of it. Sure, it might solve the problems associated with it and send the solutions to some researcher. Then it could buy the stocks of the subsequent company involved with the new technology and somehow gain control...well, at this point we are already deep into subsequent reasoning about something shaky that at the same time is used as evidence of the very reasoning involving it.
To the point: if AGI can’t pose a danger, because its hands are tied, that’s wonderful! Then we have more time to work of FAI. FAI is not about superpowerful robots, it’s about technically understanding what we want, and using that understanding to automate the manufacturing of goodness. The power is expected to come from unbounded automatic goal-directed behavior, something that happens without humans in the system to ever stop the process if it goes wrong.
To the point: if AI can’t pose a danger, because its hands are tied, that’s wonderful! Then we have more time to work of FAI.
Overall I’d feel a lot more comfortable if you just said “there’s a huge amount of uncertainty as to when existential risks will strike and which ones will strike, I don’t know whether or not I’m on the right track in focusing on Friendly AI or whether I’m right about when the Singularity will occur, I’m just doing the best that I can.”
This is largely because of the issue that I raise here
I should emphasize that I don’t think that you’d ever knowingly do something that raised existential risk, I think that you’re a kind and noble spirit. But I do think I’m raising a serious issue which you’ve missed.
Edit: See also these comments
I am looking for the evidence in “supported by evidence”. I am further trying to figure how you anticipate your beliefs to pay rent, what you anticipate to see if explosive recursive self-improvement is possible, and how that belief could be surprised by data.
If you just say, “I predict we will likely be wiped out by badly done AI.”, how do you expect to update on evidence? What would constitute such evidence?
I haven’t done the reading. For further explanation read this comment.
Why do you always and exclusively mention Charles Stross? I need to know if you actually read all of my post.
Because the fact that you’re mentioning Charles Stross means that you need to do basic reading, not complicated reading.
To put my own spin on XiXiDu’s questions: What quality or position does Charles Stross possess that should cause us to leave him out of this conversation (other than the quality ‘Eliezer doesn’t think he should be mentioned’)?
Another vacuous statement. I expected more.