I think that this post doesn’t list the strongest objection: CEV would take a long list of scientific miracles to pull off, miracles that whilst not strictly “impossible”, are each profound computer science and philosophy questions. To wit:
An AI that can simulate the outcome of human conscious deliberation, without actually instantiating a human consciousness, i.e. a detailed technical understanding of the problem of conscious experience
A way to construct an AI goal system that somehow extracts new concepts from a human upload’s brain, and then modifies itself to have a new set of goals defined in terms of those concepts.
A solution to the ontology problem in ethics
A solution to the friendliness structure problem, i.e. a self-improving AI that can reliably self-modify without error or axiological drift.
A solution to the problem of preference aggregation, (EDITED, thanks ciphergoth)
A formal implementation of Rawlesian Reflective Equilibrium for CEV to work
An AI that can solve philosophy problems that are beyond the ability of the designers to even conceive
A way to choose what subset of humanity gets included in CEV that doesn’t include too many superstitious/demented/vengeful/religious nutjobs and land those who implement it in infinite perfect hell.
All of the above working first time, without testing the entire superintelligence. (though you can test small subcomponents)
And, to make it worse, if major political powers are involved, you have to solve the political problem of getting them to agree on how to skew the CEV towards a geopolitical-power-weighted set of volitions to extrapolate, without causing a thermonuclear war as greedy political leaders fight over the future of the universe.
A way to choose what subset of humanity gets included in CEV that doesn’t include too many superstitious/demented/vengeful/religious nutjobs and land those who implement it in infinite perfect hell.
What you’re looking for is a way to construe the extrapolated volition that washes out superstition and dementation.
To the extent that vengefulness turns out to be a simple direct value that survives under many reasonable construals, it seems to me that one simple and morally elegant solution would be to filter, not the people, but the spread of their volitions, by the test, “Would your volition take into account the volition of a human who would unconditionally take into account yours?” This filters out extrapolations that end up perfectly selfish and those which end up with frozen values irrespective of what other people think—something of a hack, but it might be that many genuine reflective equilibria are just like that, and only a values-based decision can rule them out. The “unconditional” qualifier is meant to rule out TDT-like considerations, or they could just be ruled out by fiat, i.e., we want to test for cooperation in the Prisoner’s Dilemma, not in the True Prisoner’s Dilemma.
An AI that can solve philosophy problems that are beyond the ability of the designers to even conceive
It’s possible that having a complete mind design on hand would mean that there were no philosophy problems left, since the resources that human minds have to solve philosophy problems are finite, and knowing the exact method to use to solve a philosophy problem usually makes solving it pretty straightforward (the limiting factor on philosophy problems is never computing power). The reason why I pick on this particular cited problem as problematic is that, as stated, it involves an inherent asymmetry between the problems you want the AI to solve and your own understanding of how to meta-approach those problems, which is indeed a difficult and dangerous sort of state.
All of the above working first time, without testing the entire superintelligence. (though you can test small subcomponents)
All approaches to superintelligence, without exception, have this problem. It is not quite as automatically lethal as it sounds (though it is certainly automatically lethal to all other parties’ proposals for building superintelligence). You can build in test cases and warning criteria beforehand to your heart’s content. You can detect incoherence and fail safely instead of doing something incoherent. You could, though it carries with its own set of dangers, build human checking into the system at various stages and with various degrees of information exposure. But it is the fundamental problem of superintelligence, not a problem of CEV.
And, to make it worse, if major political powers are involved, you have to solve the political problem of getting them to agree on how to skew the CEV towards a geopolitical-power-weighted set of volitions to extrapolate
What you’re looking for is a way to construe the extrapolated volition that washes out superstition and dementation.
you could do that. But if you want a clean shirt out of the washing machine, you don’t add in a diaper with poo on it and then look for a really good laundry detergent to “wash it out”.
My feeling with the CEV of humanity is that if it is highly insensitive to the set of people you extrapolate, then you lose nothing by extrapolating fewer people. On the other hand, if including more people does change the answer in a direction that you regard as bad, then you gain by excluding people with values dissimilar from yours.
Furthermore, excluding people from the CEV process doesn’t mean disenfranchising them—it just means enfranchising them according to what your values count as enfranchisement.
Most people in the world don’t hold our values(1). Read, e.g. Haidt on Culturally determined values. Human values are universal in form but local in content. Our should function is parochial.
(1 - note—this doesn’t mean that they will be different after extrapolation. f(x) can equal f(y) for x!=y. But it does mean that they might, which is enough to give you an incentive not to include them)
if you want a clean shirt out of the washing machine, you don’t add in a diaper with poo on it and then look for a really good laundry detergent to “wash it out”.
I want to claim that a Friendly initial dynamic should be more analogous to a biosphere-with-a-textile-industry-in-it machine than to a washing machine. How do we get clean shirts at all, in a world with dirty diapers?
But then, it’s a strained analogy; it’s not like we’ve ever had a problem of garments claiming control over the biosphere and over other garments’ cleanliness before.
Is that just a bargaining position, or do you truly consider that no human values surviving is preferable to allowing an “unfair” weighing of volitions?
It seems that in many scenarios, the powers that be will want in. The only scenarios where they won’t are ones where the singularity happens before they take it seriously.
I am not sure how much they will screw things up if/when they do.
But it is the fundamental problem of superintelligence, not a problem of CEV.
Upload-based route don’t suffer from this as badly, because there is inherently a continuum between “one upload running at real time speed” and “10^20 intelligence-enhanced uploads running at 10^6 times normal speed”.
A solution to the problem of preference aggregation
These need seed content, but seem like they can be renormalized.
A way to choose what subset of humanity gets included in CEV that doesn’t include too many superstitious/demented/vengeful/religious nutjobs and land those who implement it in infinite perfect hell.
This may be a problem, but it seems to me that choosing this particular example, and being as confident of it as you appear to be, are symptomatic of an affective death spiral.
All of the above working first time, without testing the entire superintelligence.
The original CEV proposal appears to me to endorse using something like a CFAI-style controlled ascent rather than blind FOOM: “A key point in building a young Friendly AI is that when the chaos in the system grows too high (spread and muddle both add to chaos), the Friendly AI does not guess. The young FAI leaves the problem pending and calls a programmer, or suspends, or undergoes a deterministic controlled shutdown.”
A way to choose what subset of humanity gets included in CEV
I thought the point of defining CEV as what we would choose if we knew better was (partly) that you wouldn’t have to subset. We wouldn’t be superstitious, vengeful, and so on if we knew better.
Also, can you expand on what you mean by “Rawlesian Reflective Equilibrium”? Are you referring (however indirectly) to the “veil of ignorance” concept?
Why not? How does adding factual knowledge get rid of people’s desire to hurt someone else out of revenge?
We wouldn’t be superstitious, … if we knew better.
People who currently believe in superstitious belief system X would lose the factual falsehoods that X entailed. But most superstitious belief systems have evaluative aspects too, for example, the widespread religious belief that all nonbelievers “ought” to go to hell. I am a nonbeliever. I am also not Chinese, not Indian, not a follower of Sharia Law or Islam, not a member of the Chinese Communist Party, not a member of the Catholic Church, not a Mormon, not a “Good Christian”, and I didn’t intend to donate all my money and resources to saving lives in the third world before finding out about the singularity. There are lots of humans alive on this planet whose volitions could spring a very nasty surprise on people like us.
Why not? How does adding factual knowledge get rid of people’s desire to hurt someone else out of revenge?
Learning about the game-theoretic roots of a desire seems to generally weaken its force, and makes it apparent that one has a choice about whether or not to retain it. I don’t know what fraction of people would choose in such a state not to be vengeful, though. (Related: ‘hot’ and ‘cold’ motivational states. CEV seems to naturally privilege cold states, which should tend to reduce vengefulness, though I’m not completely sure this is the right thing to do rather than something like a negotiation between hot and cold subselves.)
What it’s like to be hurt is also factual knowledge, and seems like it might be extremely motivating towards empathy generally.
People who currently believe in superstitious belief system X would lose the factual falsehoods that X entailed. But most superstitious belief systems have evaluative aspects too, for example, the widespread religious belief that all nonbelievers “ought” to go to hell.
Why do you think it likely that people would retain that evaluative judgment upon losing the closely coupled beliefs? Far more plausibly, they could retain the general desire to punish violations of conservative social norms, but see above.
I find it interesting that there seems to be a lot of variation in people’s views regarding how much coherence there’d be in an extrapolation… You say that choosing a right group of humans is important while I’m under the impression that there is no such problem; basically everyone should be the game, and making higher level considerations about which humans to include is merely an additional source of error. Nevertheless, if there’ll be really as much coherence as I think, and I think there’d be hella lot, picking some subset of humanity would pretty much produce a CEV that is very akin to CEVs of other possible human groups.
I think that even being an Islamic radical fundamentalist is a petty factor in overall coherence. If I’m correct, Vladimir Nesov has said several times that people can be wrong about their values, and I pretty much agree. Of course, there is an obvious caveat that it’s rather shaky to guess what other people’s real values might be. Saying “You’re wrong about your professed value X, you’re real value is along the lines of Y because you cannot possibly diverge that much from the psychological unity of mankind” also risks seeming like claiming excessive moral authority. Still, I think it is a potentially valid argument, depending on the exact nature of X and Y.
Nevertheless, if there’ll be really as much coherence as I think, and I think there’d be hella lot, picking some subset of humanity would pretty much produce a CEV that is very akin to CEVs of other possible human groups.
And what would you do if Omega told you that the CEV of just {liberal westerners in your age group} is wildly different from the CEV of humanity? What do you think the right thing to do would be then?
I’d ask Omega, “Which construal of volition are you using?”
There’s light in us somewhere, a better world inside us somewhere, the question is how to let it out. It’s probably more closely akin to the part of us that says “Wouldn’t everyone getting their wishes really turn out to be awful?” than the part of us that thinks up cool wishes. And it may even be that Islamic fundamentalists just don’t have any note of grace in them at all, that there is no better future written in them anywhere, that every reasonable construal of them ends up with an atheist who still wants others to burn in hell; and if so, the test I cited in the other comment, about filtering portions of the extrapolated volition that wouldn’t respect the volition of another who unconditionally respected theirs, seems like it ought to filter that.
the test I cited in the other comment, about filtering portions of the extrapolated volition that wouldn’t respect the volition of another who unconditionally respected theirs, seems like it ought to filter that.
I agree that certain limiting factors, tests, etc could be useful. I haven’t thought hard enough about this particular proposal to say whether it is really of use. My first thought is that if you have thought about it carefully, then it probably relatively good, just based on your track record.
Eliezer has already talked about this and argued that the right thing would be to run the CEV on the whole of humanity, basing himself partly on an argument that if some particular group (not us) got control of the programming of the AI, we would prefer that they run it on the whole of humanity rather running it on themselves.
The lives of most evildoers are of course largely incredibly prosaic, and I find it hard to believe their values in their most prosaic doings are that dissimilar from everyone else around the world doing prosaic things.
I think that thinking in terms of good and evil belies a closet-realist approach to the problem. In reality, there are different people, with different cultures and biologically determined drives. These cultural and biological factors determine (approximately) a set of traditions, worldviews, ethical principles and moral rules, which can undergo a process of reflective equilibrium to determine a set of consistent preferences over the physical world.
We don’t know how the reflective equilibrium thing will go, but we know that it could depend upon the set of traditions, ethical principles and moral rules that go into it.
If someone is an illiterate devout pentecostal Christian who lives in a village in Angola, the eventual output of the preference formation process applied to them might be very different than if it were applied to the typical LW reader.
They’re not evil. They just might have a very different “should function” than me.
I think part of the point of what you call “moral anti-realism” is that it frees up words like “evil” so that they can refer to people who have particular kinds of “should function”, since there’s nothing cosmic that the word could be busy referring to instead.
If I had to offer a demonology, I guess I might loosely divide evil minds into: 1) those capable of serious moral reflection but avoiding it, e.g. because they’re busy wallowing in negative other-directed emotion, 2) those engaging in serious moral reflection but making cognitive mistakes in doing so, 3) those whose moral reflection genuinely outputs behavior that strongly conflicts with (the extension of) one’s own values. I think 1 comes closest to what’s traditionally meant by “evil”, with 2 being more “misguided” and 3 being more “Lovecraftian”. As I understand it, CEV is problematic if most people are “Lovecraftian” but less so if they’re merely “evil” or “misguided”, and I think you may in general be too quick to assume Lovecraftianity. (ETA: one main reason why I think this is that I don’t see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)
“One main reason why I think this is that I don’t see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)”
One main reason why you don’t see that happening is that the set of beliefs that you consider “right beliefs” is politically influenced, i.e. human beliefs come in certain patterns which are not connected in themselves, but are connected by the custom that people who hold one of the beliefs usually hold the others.
For example, I knew a woman (an agnostic) who favored animal rights, and some group on this basis sent her literature asking for her help with pro-abortion activities, namely because this is a typical pattern: People favoring animal rights are more likely to be pro-abortion. But she responded, “Just because I’m against torturing animals doesn’t mean I’m in favor of killing babies,” evidently quite a logical response, but not in accordance with the usual pattern.
In other words, your own values are partly determined by political patterns, and if they weren’t (which they wouldn’t be under CEV) you might well see people retaining values you dislike when they extrapolate.
As I understand it, CEV is problematic if most people are “Lovecraftian” but less so if they’re merely “evil” or “misguided”, and I think you may in general be too quick to assume Lovecraftianity.
Most people may or may not be “Lovecraftian”, but why take that risk?
There are gains from cooperating with as many others as possible. Maybe these and other factors outweigh the risk or maybe they don’t; the lower the probability and extent of Lovecraftianity, the more likely it is that they do.
Anyway, I’m not making any claims about what to do, I’m just saying people probably aren’t as Lovecraftian as Roko thinks, which I conclude both from introspection and from the statistics of what moral change we actually see in humans.
There are gains from cooperating with as many others as possible. Maybe these and other factors outweigh the risk or maybe they don’t; the lower the probability and extent of Lovecraftianity, the more likely it is that they do.
I agree that “probability and extent of Lovecraftianity” would be an important consideration if it were a matter of cooperation, and of deciding how many others to cooperate with, but Eliezer’s motivation in giving everyone equal weighting in CEV is altruism rather than cooperation. If it were cooperation, then the weights would be adjusted to account for contribution or bargaining power, instead of being equal.
Anyway, I’m not making any claims about what to do, I’m just saying people probably aren’t as Lovecraftian as Roko thinks, which I conclude both from introspection and from the statistics of what moral change we actually see in humans.
To reiterate, “how Lovecraftian” isn’t really the issue. Just by positing the possibility that most humans might turn out to be Lovecraftian, you’re operating in a meta-ethical framework at odds with Eliezer’s, and in which it doesn’t make sense to give everyone equal weight in CEV (or at least you’ll need a whole other set of arguments to justify that).
That aside, the statistics you mention might also be skewed by an anthropic selection effect.
If someone is an illiterate devout pentecostal Christian who lives in a village in Angola, the eventual output of the preference formation process applied to them might be very different than if it were applied to the typical LW reader.
Consider the distinction between whether the output of a preference-aggregation algorithm will be very different for the Angolan Christian, and whether it should be very different. Some preference-aggregation algorithms may just be confused into giving diverging results because of inconsequential distinctions, which would be bad news for everyone, even the “enlightened” westerners.
(To be precise, the relevant factual statement is about whether any two same-culture people get preferences visibly closer to each other than any two culturally distant people. It’s like with relatively small genetic relevance of skin color, where within-race variation is greater than between-races variation.)
I think we agree about this actually—several people’s picture of someone with alien values was an Islamic fundamentalist, and they were the “evildoers” I have in mind...
And what would you do if Omega told you that the CEV of just {liberal westerners in your age group} is wildly different from the CEV of humanity? What do you think the right thing to do would be then?
The right thing for me to do is to run CEV on myself, almost by definition. The CEV oracle that I am using to work out my CEV can dereference the dependencies to other CEVs better than I can.
No, not obviously; I can’t say I’ve ever seen anyone else claim to completely condition their concern for other people on the possession of similar reflective preferences.
(Or is your point that they probably wouldn’t stay people for very long, if given the means to act on their reflective preferences? That wouldn’t make it OK to kill them before then, and it would probably constitute undesirable True PD defection to do so afterwards.)
Well, my above reply was a bit tongue-in-cheek. My concern for other things in general is just as complex as my morality and it contains many meta elements such as “I’m willing to modify my preference X in order to conform to your preference Y because I currently care about your utility to a certain extent”. On the simplest level, I care for things on a sliding scale that ranges from myself to rocks or Clippy AIs with no functional analogues for human psychology (pain, etc.). Somebody with a literally wildly differing reflective preference would not be a person and, as you say, would be preferably dealt with in True PD manners rather than ordinary human-human altruism contaminated interactions.
Somebody with a literally wildly differing reflective preference would not be a person
This is a very nonstandard usage; personhood is almost universally defined in terms of consciousness and cognitive capacities, and even plausibly relevant desire-like properties like boredom don’t have much to do with reflective preference/volition.
How does adding factual knowledge get rid of people’s desire to hurt someone else out of revenge?
“If we knew better” is an ambiguous phrase, I probably should have used Eliezer’s original: “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. That carries a lot of baggage, at least for me.
I don’t experience (significant) desires of revenge, so I can only extrapolate from fictional evidence. Say the “someone” in question killed a loved one, and I wanted to hurt them for that. Suppose further that they were no longer able to kill anyone else. Given the time and the means to think about it clearly, I coud see that hurting them would not improve the state of the world for me, or for anyone else, and only impose further unnecessary suffering.
The (possibly flawed) assumption of CEV, as I understood it, is that if I could reason flawlessly, non-pathologically about all of my desires and preferences, I would no longer cleave to the self-undermining ones, and what remains would be compatible with the non-self-undermining desires and preferences of the rest of humanity.
Caveat: I have read the original CEV document but not quite as carefully as maybe I should have, mainly because it carried a “Warning: obsolete” label and I was expecting to come across more recent insights here.
The rest of Rawls’ Theory of Justice is good too. I’m trying to figure out for myself (before I finally break down and ask) how CEV compares to the veil of ignorance.
I wish you had written this a few weeks earlier, because it’s perfect as a link for the “their associated difficulties and dangers” phrase in my “Complexity of Value != Complexity of Outcome” post.
Please consider upgrading this comment to a post, perhaps with some links and additional explanations. For example, what is the ontology problem in ethics?
The ontology problem is the following: your values are defined in terms of a set of concepts. These concepts are essentially predictively useful categorizations in your model of the world. When you do science, you find that your model of the world is wrong, and you build a new model that has a different set of parts. But what do you do with your values?
In practice, I find that this is never a problem. You usually rest your values on some intuitively obvious part whatever originally caused you to create the concepts in question.
Subjective anticipation is a concept that a lot of people rest their axiology on. But it looks like subjective anticipation is an artefact of our cognitive algorithms, and all kinds of Big world theories break it. For example, MW QM means that subjective anticipation is nonsense.
Personally, I find this extremely problematic, and in practise, I think that I am just ignoring it.
I think mind copying technology may be a better illustration of the subjective anticipation problem than MW QM, but I agree that it’s a good example of the ontology problem. BTW, do you have a reference for where the ontology problem was first stated, in case I need to reference it in the future?
Thanks for the pointer, but I think the argument you gave in that post is wrong. You argued that an agent smaller than the universe has to represent its goals using an approximate ontology (and therefore would have to later re-phrase its goals relative to more accurate ontologies). But such an agent can represent its goals/preferences in compressed form, instead of using an approximate ontology. With such compressed preferences, it may not have the computational resources to determine with certainty which course of action best satisfies its preferences, but that is just a standard logical uncertainty problem.
I think the ontology problem is a real problem, but it may just be a one-time problem, where we or an AI have to translate our fuzzy human preferences into some well-defined form, instead of a problem that all agents must face over and over again.
But such an agent can represent its goals/preferences in compressed form, instead of using an approximate ontology.
Yes, if it has compressible preferences, which in reality is the case for e.g. humans and many plausible AIs.
In reality problems of the form where you discover that your preferences are stated in terms of an incorrect ontology, e.g. souls, anticipated future experience, are where this really bites.
it may just be a one-time problem, where we or an AI have to translate our fuzzy human preferences into some well-defined form, instead of a problem that all agents must face over and over again.
I think that depends upon the structure of reality. Maybe there will be a series of philosophical shocks as severe as the physicality of mental states, Big Worlds, quantum MWI, etc. Suspicion should definitely be directed at what horrors will be unleashed upon a human or AI that discovers a correct theory of quantum gravity.
Just as Big World cosmology can erode aggregative consequentialism, maybe the ultimate nature of quantum gravity will entirely erode any rational decision-making; perhaps some kind of ultimate ensemble theory already has.
On the other hand, the idea of a one-time shock is also plausible.
The reason I think it can just be a one-time shock is that we can extend our preferences to cover all possible mathematical structures. (I talked about this in Towards a New Decision Theory.) Then, no matter what kind of universe we turn out to live in, whichever theory of quantum gravity turns out to be correct, the structure of the universe will correspond to some mathematical structure which we will have well-defined preferences over.
perhaps some kind of ultimate ensemble theory already has [eroded any rational decision-making].
I addressed this issue a bit in that post as well. Are you not convinced that rational decision-making is possible in Tegmark’s Level IV Multiverse?
The next few posts on my blog are going to be basically about approaching this problem (and given the occasion, I may as well commit to writing the first post today).
You should read [*] to get a better idea of why I see “preference over all mathematical structures” as a bad call. We can’t say what “all mathematical structures” is, any given foundation only covers a portion of what we could invent. As the real world, mathematics that we might someday encounter can only be completely defined by the process of discovery (but if you capture this process, you may need nothing else).
Hope to finish it today… Though I won’t talk about philosophy of mathematics in this sub-series, I’m just going to reduce the ontological confusion about preference and laws of physics to a (still somewhat philosophical, but taking place in a comfortably formal setting) question of static analysis of computer programs.
Yes, talking about “preference over all mathematical structures” does gloss over some problems in the philosophy of mathematics, and I am sympathetic to anti-foundationalist views like Awodey’s.
Also, in general I agree with Roko on the need for an AI that can do philosophy better than any human, so in this thread I was mostly picking a nit with a specific argument that he had.
(I was going to remind you about the missing post, but I see Roko already did. :)
we can extend our preferences to cover all possible mathematical structures.
I define the following structure: if you take action a, all possible logically possible consequences will follow, i.e. all computable sensory I/O functions, generated by all possible computable changes in the objective physical universe. This holds for all a. This is facilitated by the universe creating infinitely many copies of you every time you take an action, and there being literally no fact of the matter about which one is you.
Now if you have already extended your preferences over all possible mathematical structures, you presumably have a preferred action in this case. But the preferred action is really rather unrelated to your life before you made this unsettling discovery. Beings that had different evolved desires (such as seeking status versus maximizing offspring) wouldn’t produce systematically different preferences, they’d essentially have to choose at random.
If Tegmark Level 4 is, in some sense “true”, this hypothetical example is not really so hypothetical—it is very similar to the situation that we are in, with the caveat that you can argue about weightings/priors over mathematical structures, so some consequences get a lower weighting than others, given the prior you chose.
My intuition tells me that Level 4 is a mistake, and that there is such a thing as the consequence of my actions. However, mere MW quantum mechanics casts doubt on the idea of anticipated subjective experience, so I am suspicious of my anti-multiverse intuition. Perhaps what we need is the equivalent of a theory of born probabilities for Tegmark Level 4 - something in the region of what Nick Bostrom tried to do in his book on anthropic reasoning (though it looks like Nick simply added more arbitrariness into the mix in the form of reference classes)
My intuition tells me that Level 4 is a mistake, and that there is such a thing as the consequence of my actions.
I disagree on the first part, and agree on the second part.
with the caveat that you can argue about weightings/priors over mathematical structures, so some consequences get a lower weighting than others, given the prior you chose.
Yes, and that’s enough for rational decision making. I’m not really sure why you’re not seeing that...
Yes, and that’s enough for rational decision making.
I agree that you can turn the handle on a particular piece of mathematics that resembles decisionmaking, but some part of me says that you’re just playing a game with yourself: you decide that everything exists, then you put a prior over everything, then you act to maximize your utility, weighted by that prior. It is certainly a blow to one’s intuition that one can only salvage the ability to act by playing a game of make-believe that some sections of “everything” are “less real” than others, where your real-ness prior is something you had to make up anyway.
Others also think that I am just slow on the uptake of this idea. But to me the idea that reality is not fixed but relative to what real-ness prior you decide to pick is extremely ugly. It would mean that the utility of technology to achieve things is merely a shared delusion, that if a theist chose a real-ness prior that assigned high real-ness only to universes where a theistic god existed then he would be correct to pray, etc. Effectively you’re saying that the postmodernists were right after all.
Now, the fact that I have a negative emotional reaction to this proposal doesn’t make it less true, of course.
There is a deep analogy between how you can’t change the laws of physics (contents of reality, apart from lawfully acting) and how you can’t change your own program. It’s not a delusion unless it can be reached by mistake. The theist can’t be right to act as if a deity exists unless his program (brain) is such that it is the correct way to act, and he can’t change his mind for it to become right, because it’s impossible to change one’s program, only act according to it.
The problem is that this point of view means that in a debate with someone who is firmly religious, not only is the religious person right, but you regret the fact that you are “rational”; you lament “if only I had been brought up with religious indoctrination, I would correctly believe that I am going to heaven”.
Any rational theory that leaves you lamenting your own rationality deserves some serious scepticism.
Following the same analogy, you can translate it as “if only the God did in fact exist, …”. The difference doesn’t seem particularly significant—both “what ifs” are equally impossible. “Regretting rationality” is on a different level—rationality in the relevant sense is a matter of choice. The program that defines your decision-making algorithm isn’t.
I still fear that you are reading in my words something very different from what I intend, as I don’t see the possibility of a religious person’s mind actually acting as if God is real. A religious person may have a free-floating network of beliefs about God, but it doesn’t survive under reflection. A true god-impressed mind would actually act as if God is real, no matter what, it won’t be deconvertable, and indeed under reflection an atheist god-impressed mind will correctly discard atheism.
Not all beliefs are equal, a human atheist is correct not just according to atheist’s standard, and a human theist is incorrect not just to atheist’s standard. The standard is in the world, or, under this analogy, in the mind. (The mind is a better place for ontology, because preference is also here, and human mind can be completely formalized, unlike the unknown laws of physics. By the way, the first post is up).
So your argument is that the reason that the theists are wrong is because they only sorta-kinda believe in God anyway, but if they really believed, then they’d be just as right as we are?
So your argument is that the reason that the theists are wrong is because they only sorta-kinda believe in God anyway, but if they really believed, then they’d be just as right as we are?
But only in the sense that their calculation could be correct according to a particularly weird prior. The difference between normal theist and a “god-impressed mind” who both believe in God is that of rationality: the former makes mistakes in updating beliefs, the latter probably doesn’t. The same with an atheist god-impressed mind and a human atheist. You can’t expect to find that weird a prior in a human. And of course, you should say that the god-impressed are wrong about their beliefs, though they correctly follow the evidence according to their prior. If you value their success in the real world more than the autonomy of their preference, you may want to reach into their minds and make appropriate changes.
I should say again: the program that defines the decision-making algorithm can’t be normally changed, which means that one can’t be really “converted” to a different preference, though one can be converted to different beliefs and feelings. Observations don’t change the algorithm, they are processed according to that algorithm. This means that if you care about reflective consistency (and everyone does, in the sense of preservation of preference), you’d try to counteract the unwanted effects of environment on yourself, including the self-promoting effects where you start liking the new situation. The extent to which you like the new situation, the “level of conviction”, it’s pretty much irrelevant, just as the presence of a losing psychological drive. It’d take great integrity (not “strength of conviction”) in the change for significantly different values to really sink in, in the sense that the new preference-on-reflection will resemble the new beliefs and feelings similarly to how the native preference-on-reflection will resemble native (sane, secular, etc.) beliefs and feelings.
Yes, that wasn’t careful. In this context, I mean “no large shift of preference”. Tiny changes occur all the time (and are actually very important if you scale them up by giving the preference with/without these changes to a FAI). You can model the extent of reversibility (as compared to a formal computer program) by roughly what can be inferred about the person’s past, which doesn’t necessarily all has to be from the person’s brain. (By an algorithm in human brain I mean all of human brain, basically a program that would run an upload implementation, together with the data.)
I agree that it’s ugly to think of the weights as a pretense on how real certain parts of reality are. That’s why I think it may be better to think of them as representing how much you care about various parts of reality. (For the benefit of other readers, I talked about this in What Are Probabilities, Anyway?.)
Actually, I haven’t completely given up the idea that there is some objective notion of how real, or how important, various parts of reality are. It’s hard to escape the intuition that some parts of math are just easier to reach or find than others, in a way that is not dependent how human minds work.
In reality problems of the form where you discover that your preferences are stated in terms of an incorrect ontology, e.g. souls, anticipated future experience, are where this really bites.
I believe even personal identity falls under this category. A lot of moral intuitions work with the-me-in-the-future object, as marked in the map. To follow these intuitions, it is very important for us to have a good idea of where the-me-in-the-future is, to have a good map of this thing. When you get to weird thought experiments with copying, this epistemic step breaks down, because if there are multiple future-copies, the-me-in-the-future is a pattern that is absent. As a result, moral intuitions, that indirectly work through this mark on the map, get confused and start giving the wrong answers as well. This can be readily observed for example from preferential inconsistency in time expected in such thought experiments (you precommit to teleporting-with-delay, but then your copy that is to be destroyed starts complaining).
Personal identity is (in general) a wrong epistemic question asked by our moral intuition. Only if preference is expressed in terms of the territory (or rather in a form flexible enough to follow all possible developments), including the parts currently represented in moral intuition in terms of the-me-in-the-future object in the territory, will the confusion with expectations and anthropic thought experiments go away.
Please consider upgrading this comment to a post, perhaps with some links and additional explanations. For example, what is the ontology problem in ethics?
I agree that preference aggregation is hard. Wei dai and nick Bostrom have both made proposals based upon agents negotiating with some deadline or constraint.
Maybe I’m crazy but all that doesn’t sound so hard.
More precisely, there’s one part, the solution to which should require nothing more than steady hard work, and another part which is so nebulous that even the problems are still fuzzy.
The first part—requiring just steady hard work—is everything that can be reduced to existing physics and mathematics. We’re supposed to take the human brain as input and get a human-friendly AI as output. The human brain is a decision-making system; it’s a genetically encoded decision architecture or decision architecture schema, with the parameters of the schema being set in the individual by genetic or environmental contingencies. CEV is all about answering the question: If a superintelligence appeared in our midst, what would the human race want its decision architecture to be, if we had time enough to think things through and arrive at a stable answer? So it boils down to asking, if you had a number of instances of the specific decision architecture human brain, and they were asked to choose a decision architecture for an entity of arbitrarily high intelligence that was to be introduced into their environment, what would be their asymptotically stable preference? That just doesn’t sound like a mindbogglingly difficult problem. It’s certainly a question that should be answerable for much simpler classes of decision architecture.
So it seems to me that the main challenge is simply to understand what the human decision architecture is. And again, that shouldn’t be beyond us at all. The human genome is completely sequenced, we know the physics of the brain down to nucleons, there’s only a finite number of cell types in the body—yes it’s complicated, but it’s really just a matter of sticking with the problem. (Or would be, if there was no time factor. But how to do all this quickly is a separate problem.)
So to sum up, all we need to do is to solve the decision theory problem ‘if agents X, Y, Z… get to determine the value system and cognitive architecture of a new, superintelligent agent A which will be introduced into their environment, what would their asymptotic preference be?’; correctly identify the human decision architecture; and then substitute this for X, Y, Z… in the preceding problem.
That’s the first part, the ‘easy’ part. What’s the second part, the hard but nebulous part? Everything to do with consciousness, inconceivable future philosophy problems, and so forth. Now what’s peculiar about this situation is that the existence of nebulous hard problems suggests that the thinker is missing something big about the nature of reality, and yet the easy part of the problem seems almost completely specified. How can the easy part appear closed, an exactly specified problem simply awaiting solution, and yet at the same time, other aspects of the overall task seem so beyond understanding? This contradiction is itself something of a nebulous hard problem.
Anyway, achieving the CEV agenda seems to require a combination of steady work on a well-defined problem where we do already have everything we need to solve it, and rumination on nebulous imponderables in the hope of achieving clarity—including clarity about the relationship between the imponderables and the well-defined problem. I think that is very doable—the combination of steady work and contemplation, that is. And the contemplation is itself another form of steady work—steadily thinking about the nebulous problems, until they resolve themselves.
So long as there are still enigmas in the existential equation we can’t be sure of the outcome, but I think we can know, right now, that it’s possible to work on the problem (easy and hard aspects alike) in a systematic and logical way.
An AI that can simulate the outcome of human conscious deliberation, without actually instantiating a human consciousness, i.e. a detailed technical understanding of the problem of conscious experience
Could you clarify for me what you mean by requiring that that a human consciousness be instantiated? Is it that you don’t believe it is possible to elicit a CEV from a human if instantiation is involved or that you object to the consequences of simulating human consciousnesses in potentially undesirable situations?
In the case of the latter I observe that this is only a problem under certain CEVs and so is somewhat different in nature to the other requirements. Some people’s CEVs could then be extracted more easily than others.
No reply. Just so you know, the collective position on testing here is bizarre.
How you can think that superintelligent agents are often dangerous AND that a good way of dealing with this is to release an untested one on the world is really quite puzzling.
Hardly anyone ever addresses the issue. When they do, it is by pointing to AI box experiments, which purport to show that a superintelligence can defeat a lesser intelligence, even if well strapped down.
That seems irrelevant to me. To build a jail, for the smartest agent in the world, you do not use vastly less powerful agents as guards, you use slightly less powerful ones. If necesssary, you can dope the restrained agent up a bit. There are in fact all manner of approaches to this problem—I recommend thinking about them some more before discarding the whole idea of testing superintelligences.
I am no where near caught up on FAI readings but here are is a humble thought.
What I have read so far seems to be assuming a single jump FAI. That is once the FAI is set it must take us to where we ultimately want to go without further human input. Please correct me if I am wrong.
What about a multistage approach?
The problem that people might immediately bring up is that a multistage approach might lead elevating subgoals to goals. We say, “take us to mastery of nanotech” and the AI decides to rip us apart and organize all existing ribosomes under a coherent command.
However, perhaps what we need to do is verify that any intermediate state goal better than the current state.
So what if we have the AI guess a goal state. Then simulate that goal state and expose some subset of humans to that simulation. The AI the asks “Proceed to this stage or no” The humans answer.
Once in the next stage we can reassess.
To give a sense of motivation: it seems that verifying the goodness of future-state is easier than trying to construct the basic rules of good statedness.
Powerful machine intelligences can be expected to have natural drives to eliminate competing goal-based systems.
So, unless there are safeguards against it, a machine intelligence is likely to assassinate potential designers of other machine intelligences which may have subtly different goals. IMO, assassinating your competitors is not an acceptable business practice.
CEV doesn’t seem to have much in the way of safeguards against this. It isn’t even constrained to follow the law of the land. I think as it stands, it has clear criminal tendencies—and so should not be built.
Have you ever heard of the last president of the US? He’s a particularly extreme example in criminality for a president, but I’m pretty sure that all or nearly all presidents would count extremely criminal compared to what you are used to from day-to-day life. Congressman likewise.
What sense of ‘criminal’ are you using here? Presumably not ‘convicted of a crime by a court’ since that is relatively rare for politicians. Do you mean ‘have committed acts that are against the law but have not been prosecuted’ or do you mean ‘have committed acts that in my view are/should-be-viewed-as criminal but have not actually broken the law technically’?
He has publicly admitted to ordering violations of the FISA statute, a felony, so certainly “have committed acts that are against the law but have not been prosecuted”.
US politics is not my area—but I don’t think there has ever been a criminal prosecution of an incumbent president.
However, sometimes criminals do get some influence and cause significant damage. It seems like a good reason to do what you can to prevent such things from happening.
Objections to Coherent Extrapolated Volition
http://www.singinst.org/blog/2007/06/13/objections-to-coherent-extrapolated-volition/
I think that this post doesn’t list the strongest objection: CEV would take a long list of scientific miracles to pull off, miracles that whilst not strictly “impossible”, are each profound computer science and philosophy questions. To wit:
An AI that can simulate the outcome of human conscious deliberation, without actually instantiating a human consciousness, i.e. a detailed technical understanding of the problem of conscious experience
A way to construct an AI goal system that somehow extracts new concepts from a human upload’s brain, and then modifies itself to have a new set of goals defined in terms of those concepts.
A solution to the ontology problem in ethics
A solution to the friendliness structure problem, i.e. a self-improving AI that can reliably self-modify without error or axiological drift.
A solution to the problem of preference aggregation, (EDITED, thanks ciphergoth)
A formal implementation of Rawlesian Reflective Equilibrium for CEV to work
An AI that can solve philosophy problems that are beyond the ability of the designers to even conceive
A way to choose what subset of humanity gets included in CEV that doesn’t include too many superstitious/demented/vengeful/religious nutjobs and land those who implement it in infinite perfect hell.
All of the above working first time, without testing the entire superintelligence. (though you can test small subcomponents)
And, to make it worse, if major political powers are involved, you have to solve the political problem of getting them to agree on how to skew the CEV towards a geopolitical-power-weighted set of volitions to extrapolate, without causing a thermonuclear war as greedy political leaders fight over the future of the universe.
What you’re looking for is a way to construe the extrapolated volition that washes out superstition and dementation.
To the extent that vengefulness turns out to be a simple direct value that survives under many reasonable construals, it seems to me that one simple and morally elegant solution would be to filter, not the people, but the spread of their volitions, by the test, “Would your volition take into account the volition of a human who would unconditionally take into account yours?” This filters out extrapolations that end up perfectly selfish and those which end up with frozen values irrespective of what other people think—something of a hack, but it might be that many genuine reflective equilibria are just like that, and only a values-based decision can rule them out. The “unconditional” qualifier is meant to rule out TDT-like considerations, or they could just be ruled out by fiat, i.e., we want to test for cooperation in the Prisoner’s Dilemma, not in the True Prisoner’s Dilemma.
It’s possible that having a complete mind design on hand would mean that there were no philosophy problems left, since the resources that human minds have to solve philosophy problems are finite, and knowing the exact method to use to solve a philosophy problem usually makes solving it pretty straightforward (the limiting factor on philosophy problems is never computing power). The reason why I pick on this particular cited problem as problematic is that, as stated, it involves an inherent asymmetry between the problems you want the AI to solve and your own understanding of how to meta-approach those problems, which is indeed a difficult and dangerous sort of state.
All approaches to superintelligence, without exception, have this problem. It is not quite as automatically lethal as it sounds (though it is certainly automatically lethal to all other parties’ proposals for building superintelligence). You can build in test cases and warning criteria beforehand to your heart’s content. You can detect incoherence and fail safely instead of doing something incoherent. You could, though it carries with its own set of dangers, build human checking into the system at various stages and with various degrees of information exposure. But it is the fundamental problem of superintelligence, not a problem of CEV.
I will not lend my skills to any such thing.
you could do that. But if you want a clean shirt out of the washing machine, you don’t add in a diaper with poo on it and then look for a really good laundry detergent to “wash it out”.
My feeling with the CEV of humanity is that if it is highly insensitive to the set of people you extrapolate, then you lose nothing by extrapolating fewer people. On the other hand, if including more people does change the answer in a direction that you regard as bad, then you gain by excluding people with values dissimilar from yours.
Furthermore, excluding people from the CEV process doesn’t mean disenfranchising them—it just means enfranchising them according to what your values count as enfranchisement.
Most people in the world don’t hold our values(1). Read, e.g. Haidt on Culturally determined values. Human values are universal in form but local in content. Our should function is parochial.
(1 - note—this doesn’t mean that they will be different after extrapolation. f(x) can equal f(y) for x!=y. But it does mean that they might, which is enough to give you an incentive not to include them)
I want to claim that a Friendly initial dynamic should be more analogous to a biosphere-with-a-textile-industry-in-it machine than to a washing machine. How do we get clean shirts at all, in a world with dirty diapers?
But then, it’s a strained analogy; it’s not like we’ve ever had a problem of garments claiming control over the biosphere and over other garments’ cleanliness before.
Is that just a bargaining position, or do you truly consider that no human values surviving is preferable to allowing an “unfair” weighing of volitions?
It seems that in many scenarios, the powers that be will want in. The only scenarios where they won’t are ones where the singularity happens before they take it seriously.
I am not sure how much they will screw things up if/when they do.
Upload-based route don’t suffer from this as badly, because there is inherently a continuum between “one upload running at real time speed” and “10^20 intelligence-enhanced uploads running at 10^6 times normal speed”.
“Would your volition take into account the volition of a human who would unconditionally take into account yours?”
Doesn’t this still give them the freedom to weight that voilition as small as they like?
Some quibbles:
These need seed content, but seem like they can be renormalized.
This may be a problem, but it seems to me that choosing this particular example, and being as confident of it as you appear to be, are symptomatic of an affective death spiral.
The original CEV proposal appears to me to endorse using something like a CFAI-style controlled ascent rather than blind FOOM: “A key point in building a young Friendly AI is that when the chaos in the system grows too high (spread and muddle both add to chaos), the Friendly AI does not guess. The young FAI leaves the problem pending and calls a programmer, or suspends, or undergoes a deterministic controlled shutdown.”
Useful and interesting list, thanks.
I thought the point of defining CEV as what we would choose if we knew better was (partly) that you wouldn’t have to subset. We wouldn’t be superstitious, vengeful, and so on if we knew better.
Also, can you expand on what you mean by “Rawlesian Reflective Equilibrium”? Are you referring (however indirectly) to the “veil of ignorance” concept?
Why not? How does adding factual knowledge get rid of people’s desire to hurt someone else out of revenge?
People who currently believe in superstitious belief system X would lose the factual falsehoods that X entailed. But most superstitious belief systems have evaluative aspects too, for example, the widespread religious belief that all nonbelievers “ought” to go to hell. I am a nonbeliever. I am also not Chinese, not Indian, not a follower of Sharia Law or Islam, not a member of the Chinese Communist Party, not a member of the Catholic Church, not a Mormon, not a “Good Christian”, and I didn’t intend to donate all my money and resources to saving lives in the third world before finding out about the singularity. There are lots of humans alive on this planet whose volitions could spring a very nasty surprise on people like us.
Learning about the game-theoretic roots of a desire seems to generally weaken its force, and makes it apparent that one has a choice about whether or not to retain it. I don’t know what fraction of people would choose in such a state not to be vengeful, though. (Related: ‘hot’ and ‘cold’ motivational states. CEV seems to naturally privilege cold states, which should tend to reduce vengefulness, though I’m not completely sure this is the right thing to do rather than something like a negotiation between hot and cold subselves.)
What it’s like to be hurt is also factual knowledge, and seems like it might be extremely motivating towards empathy generally.
Why do you think it likely that people would retain that evaluative judgment upon losing the closely coupled beliefs? Far more plausibly, they could retain the general desire to punish violations of conservative social norms, but see above.
I find it interesting that there seems to be a lot of variation in people’s views regarding how much coherence there’d be in an extrapolation… You say that choosing a right group of humans is important while I’m under the impression that there is no such problem; basically everyone should be the game, and making higher level considerations about which humans to include is merely an additional source of error. Nevertheless, if there’ll be really as much coherence as I think, and I think there’d be hella lot, picking some subset of humanity would pretty much produce a CEV that is very akin to CEVs of other possible human groups.
I think that even being an Islamic radical fundamentalist is a petty factor in overall coherence. If I’m correct, Vladimir Nesov has said several times that people can be wrong about their values, and I pretty much agree. Of course, there is an obvious caveat that it’s rather shaky to guess what other people’s real values might be. Saying “You’re wrong about your professed value X, you’re real value is along the lines of Y because you cannot possibly diverge that much from the psychological unity of mankind” also risks seeming like claiming excessive moral authority. Still, I think it is a potentially valid argument, depending on the exact nature of X and Y.
And what would you do if Omega told you that the CEV of just {liberal westerners in your age group} is wildly different from the CEV of humanity? What do you think the right thing to do would be then?
I’d ask Omega, “Which construal of volition are you using?”
There’s light in us somewhere, a better world inside us somewhere, the question is how to let it out. It’s probably more closely akin to the part of us that says “Wouldn’t everyone getting their wishes really turn out to be awful?” than the part of us that thinks up cool wishes. And it may even be that Islamic fundamentalists just don’t have any note of grace in them at all, that there is no better future written in them anywhere, that every reasonable construal of them ends up with an atheist who still wants others to burn in hell; and if so, the test I cited in the other comment, about filtering portions of the extrapolated volition that wouldn’t respect the volition of another who unconditionally respected theirs, seems like it ought to filter that.
I agree that certain limiting factors, tests, etc could be useful. I haven’t thought hard enough about this particular proposal to say whether it is really of use. My first thought is that if you have thought about it carefully, then it probably relatively good, just based on your track record.
Eliezer has already talked about this and argued that the right thing would be to run the CEV on the whole of humanity, basing himself partly on an argument that if some particular group (not us) got control of the programming of the AI, we would prefer that they run it on the whole of humanity rather running it on themselves.
The lives of most evildoers are of course largely incredibly prosaic, and I find it hard to believe their values in their most prosaic doings are that dissimilar from everyone else around the world doing prosaic things.
I wasn’t think of evildoers. I was thinking of people who are just different, and have their own culture, traditions and way of life.
I think that thinking in terms of good and evil belies a closet-realist approach to the problem. In reality, there are different people, with different cultures and biologically determined drives. These cultural and biological factors determine (approximately) a set of traditions, worldviews, ethical principles and moral rules, which can undergo a process of reflective equilibrium to determine a set of consistent preferences over the physical world.
We don’t know how the reflective equilibrium thing will go, but we know that it could depend upon the set of traditions, ethical principles and moral rules that go into it.
If someone is an illiterate devout pentecostal Christian who lives in a village in Angola, the eventual output of the preference formation process applied to them might be very different than if it were applied to the typical LW reader.
They’re not evil. They just might have a very different “should function” than me.
I think part of the point of what you call “moral anti-realism” is that it frees up words like “evil” so that they can refer to people who have particular kinds of “should function”, since there’s nothing cosmic that the word could be busy referring to instead.
If I had to offer a demonology, I guess I might loosely divide evil minds into: 1) those capable of serious moral reflection but avoiding it, e.g. because they’re busy wallowing in negative other-directed emotion, 2) those engaging in serious moral reflection but making cognitive mistakes in doing so, 3) those whose moral reflection genuinely outputs behavior that strongly conflicts with (the extension of) one’s own values. I think 1 comes closest to what’s traditionally meant by “evil”, with 2 being more “misguided” and 3 being more “Lovecraftian”. As I understand it, CEV is problematic if most people are “Lovecraftian” but less so if they’re merely “evil” or “misguided”, and I think you may in general be too quick to assume Lovecraftianity. (ETA: one main reason why I think this is that I don’t see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)
“One main reason why I think this is that I don’t see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)”
One main reason why you don’t see that happening is that the set of beliefs that you consider “right beliefs” is politically influenced, i.e. human beliefs come in certain patterns which are not connected in themselves, but are connected by the custom that people who hold one of the beliefs usually hold the others.
For example, I knew a woman (an agnostic) who favored animal rights, and some group on this basis sent her literature asking for her help with pro-abortion activities, namely because this is a typical pattern: People favoring animal rights are more likely to be pro-abortion. But she responded, “Just because I’m against torturing animals doesn’t mean I’m in favor of killing babies,” evidently quite a logical response, but not in accordance with the usual pattern.
In other words, your own values are partly determined by political patterns, and if they weren’t (which they wouldn’t be under CEV) you might well see people retaining values you dislike when they extrapolate.
Most people may or may not be “Lovecraftian”, but why take that risk?
There are gains from cooperating with as many others as possible. Maybe these and other factors outweigh the risk or maybe they don’t; the lower the probability and extent of Lovecraftianity, the more likely it is that they do.
Anyway, I’m not making any claims about what to do, I’m just saying people probably aren’t as Lovecraftian as Roko thinks, which I conclude both from introspection and from the statistics of what moral change we actually see in humans.
I agree that “probability and extent of Lovecraftianity” would be an important consideration if it were a matter of cooperation, and of deciding how many others to cooperate with, but Eliezer’s motivation in giving everyone equal weighting in CEV is altruism rather than cooperation. If it were cooperation, then the weights would be adjusted to account for contribution or bargaining power, instead of being equal.
To reiterate, “how Lovecraftian” isn’t really the issue. Just by positing the possibility that most humans might turn out to be Lovecraftian, you’re operating in a meta-ethical framework at odds with Eliezer’s, and in which it doesn’t make sense to give everyone equal weight in CEV (or at least you’ll need a whole other set of arguments to justify that).
That aside, the statistics you mention might also be skewed by an anthropic selection effect.
Alternately: They’re evil. They have a very different ‘should function’ to me.
Consider the distinction between whether the output of a preference-aggregation algorithm will be very different for the Angolan Christian, and whether it should be very different. Some preference-aggregation algorithms may just be confused into giving diverging results because of inconsequential distinctions, which would be bad news for everyone, even the “enlightened” westerners.
(To be precise, the relevant factual statement is about whether any two same-culture people get preferences visibly closer to each other than any two culturally distant people. It’s like with relatively small genetic relevance of skin color, where within-race variation is greater than between-races variation.)
I think we agree about this actually—several people’s picture of someone with alien values was an Islamic fundamentalist, and they were the “evildoers” I have in mind...
The right thing for me to do is to run CEV on myself, almost by definition. The CEV oracle that I am using to work out my CEV can dereference the dependencies to other CEVs better than I can.
If truly, really wildly different? Obviously, I’d just disassemble them to useful matter via nanobots.
No, not obviously; I can’t say I’ve ever seen anyone else claim to completely condition their concern for other people on the possession of similar reflective preferences.
(Or is your point that they probably wouldn’t stay people for very long, if given the means to act on their reflective preferences? That wouldn’t make it OK to kill them before then, and it would probably constitute undesirable True PD defection to do so afterwards.)
Well, my above reply was a bit tongue-in-cheek. My concern for other things in general is just as complex as my morality and it contains many meta elements such as “I’m willing to modify my preference X in order to conform to your preference Y because I currently care about your utility to a certain extent”. On the simplest level, I care for things on a sliding scale that ranges from myself to rocks or Clippy AIs with no functional analogues for human psychology (pain, etc.). Somebody with a literally wildly differing reflective preference would not be a person and, as you say, would be preferably dealt with in True PD manners rather than ordinary human-human altruism contaminated interactions.
This is a very nonstandard usage; personhood is almost universally defined in terms of consciousness and cognitive capacities, and even plausibly relevant desire-like properties like boredom don’t have much to do with reflective preference/volition.
“If we knew better” is an ambiguous phrase, I probably should have used Eliezer’s original: “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. That carries a lot of baggage, at least for me.
I don’t experience (significant) desires of revenge, so I can only extrapolate from fictional evidence. Say the “someone” in question killed a loved one, and I wanted to hurt them for that. Suppose further that they were no longer able to kill anyone else. Given the time and the means to think about it clearly, I coud see that hurting them would not improve the state of the world for me, or for anyone else, and only impose further unnecessary suffering.
The (possibly flawed) assumption of CEV, as I understood it, is that if I could reason flawlessly, non-pathologically about all of my desires and preferences, I would no longer cleave to the self-undermining ones, and what remains would be compatible with the non-self-undermining desires and preferences of the rest of humanity.
Caveat: I have read the original CEV document but not quite as carefully as maybe I should have, mainly because it carried a “Warning: obsolete” label and I was expecting to come across more recent insights here.
http://plato.stanford.edu/entries/reflective-equilibrium/
I am only part way through but I really recommend that link. So far it’s really helped me think about this.
The rest of Rawls’ Theory of Justice is good too. I’m trying to figure out for myself (before I finally break down and ask) how CEV compares to the veil of ignorance.
I wish you had written this a few weeks earlier, because it’s perfect as a link for the “their associated difficulties and dangers” phrase in my “Complexity of Value != Complexity of Outcome” post.
Please consider upgrading this comment to a post, perhaps with some links and additional explanations. For example, what is the ontology problem in ethics?
The ontology problem is the following: your values are defined in terms of a set of concepts. These concepts are essentially predictively useful categorizations in your model of the world. When you do science, you find that your model of the world is wrong, and you build a new model that has a different set of parts. But what do you do with your values?
In practice, I find that this is never a problem. You usually rest your values on some intuitively obvious part whatever originally caused you to create the concepts in question.
Subjective anticipation is a concept that a lot of people rest their axiology on. But it looks like subjective anticipation is an artefact of our cognitive algorithms, and all kinds of Big world theories break it. For example, MW QM means that subjective anticipation is nonsense.
Personally, I find this extremely problematic, and in practise, I think that I am just ignoring it.
I think mind copying technology may be a better illustration of the subjective anticipation problem than MW QM, but I agree that it’s a good example of the ontology problem. BTW, do you have a reference for where the ontology problem was first stated, in case I need to reference it in the future?
I mentioned it on my blog in august 2008 in the post “ontologies, approximations and fundamentalists”
Peter de Blanc invented it independently, and I think that one of Eliezer and Marcello probably did too.
I invented it sometime around the dawn of time, don’t know if Marcello did in advance or not.
Actually, I don’t know if I could have claimed to invent it, there may be science fiction priors.
Thanks for the pointer, but I think the argument you gave in that post is wrong. You argued that an agent smaller than the universe has to represent its goals using an approximate ontology (and therefore would have to later re-phrase its goals relative to more accurate ontologies). But such an agent can represent its goals/preferences in compressed form, instead of using an approximate ontology. With such compressed preferences, it may not have the computational resources to determine with certainty which course of action best satisfies its preferences, but that is just a standard logical uncertainty problem.
I think the ontology problem is a real problem, but it may just be a one-time problem, where we or an AI have to translate our fuzzy human preferences into some well-defined form, instead of a problem that all agents must face over and over again.
Yes, if it has compressible preferences, which in reality is the case for e.g. humans and many plausible AIs.
In reality problems of the form where you discover that your preferences are stated in terms of an incorrect ontology, e.g. souls, anticipated future experience, are where this really bites.
I think that depends upon the structure of reality. Maybe there will be a series of philosophical shocks as severe as the physicality of mental states, Big Worlds, quantum MWI, etc. Suspicion should definitely be directed at what horrors will be unleashed upon a human or AI that discovers a correct theory of quantum gravity.
Just as Big World cosmology can erode aggregative consequentialism, maybe the ultimate nature of quantum gravity will entirely erode any rational decision-making; perhaps some kind of ultimate ensemble theory already has.
On the other hand, the idea of a one-time shock is also plausible.
The reason I think it can just be a one-time shock is that we can extend our preferences to cover all possible mathematical structures. (I talked about this in Towards a New Decision Theory.) Then, no matter what kind of universe we turn out to live in, whichever theory of quantum gravity turns out to be correct, the structure of the universe will correspond to some mathematical structure which we will have well-defined preferences over.
I addressed this issue a bit in that post as well. Are you not convinced that rational decision-making is possible in Tegmark’s Level IV Multiverse?
The next few posts on my blog are going to be basically about approaching this problem (and given the occasion, I may as well commit to writing the first post today).
You should read [*] to get a better idea of why I see “preference over all mathematical structures” as a bad call. We can’t say what “all mathematical structures” is, any given foundation only covers a portion of what we could invent. As the real world, mathematics that we might someday encounter can only be completely defined by the process of discovery (but if you capture this process, you may need nothing else).
--
[*] S. Awodey (2004). `An Answer to Hellman’s Question: ’Does Category Theory Provide a Framework for Mathematical Structuralism?”. Philosophia Mathematica 12(1):54-64.
The idea that ethics depends upon one’s philosophy of mathematics is intriguing.
By the way, I see no post about this on the causality relay!
Hope to finish it today… Though I won’t talk about philosophy of mathematics in this sub-series, I’m just going to reduce the ontological confusion about preference and laws of physics to a (still somewhat philosophical, but taking place in a comfortably formal setting) question of static analysis of computer programs.
Great to hear. Looking forward to reading it.
Yes, talking about “preference over all mathematical structures” does gloss over some problems in the philosophy of mathematics, and I am sympathetic to anti-foundationalist views like Awodey’s.
Also, in general I agree with Roko on the need for an AI that can do philosophy better than any human, so in this thread I was mostly picking a nit with a specific argument that he had.
(I was going to remind you about the missing post, but I see Roko already did. :)
I define the following structure: if you take action a, all possible logically possible consequences will follow, i.e. all computable sensory I/O functions, generated by all possible computable changes in the objective physical universe. This holds for all a. This is facilitated by the universe creating infinitely many copies of you every time you take an action, and there being literally no fact of the matter about which one is you.
Now if you have already extended your preferences over all possible mathematical structures, you presumably have a preferred action in this case. But the preferred action is really rather unrelated to your life before you made this unsettling discovery. Beings that had different evolved desires (such as seeking status versus maximizing offspring) wouldn’t produce systematically different preferences, they’d essentially have to choose at random.
If Tegmark Level 4 is, in some sense “true”, this hypothetical example is not really so hypothetical—it is very similar to the situation that we are in, with the caveat that you can argue about weightings/priors over mathematical structures, so some consequences get a lower weighting than others, given the prior you chose.
My intuition tells me that Level 4 is a mistake, and that there is such a thing as the consequence of my actions. However, mere MW quantum mechanics casts doubt on the idea of anticipated subjective experience, so I am suspicious of my anti-multiverse intuition. Perhaps what we need is the equivalent of a theory of born probabilities for Tegmark Level 4 - something in the region of what Nick Bostrom tried to do in his book on anthropic reasoning (though it looks like Nick simply added more arbitrariness into the mix in the form of reference classes)
I disagree on the first part, and agree on the second part.
Yes, and that’s enough for rational decision making. I’m not really sure why you’re not seeing that...
I agree that you can turn the handle on a particular piece of mathematics that resembles decisionmaking, but some part of me says that you’re just playing a game with yourself: you decide that everything exists, then you put a prior over everything, then you act to maximize your utility, weighted by that prior. It is certainly a blow to one’s intuition that one can only salvage the ability to act by playing a game of make-believe that some sections of “everything” are “less real” than others, where your real-ness prior is something you had to make up anyway.
Others also think that I am just slow on the uptake of this idea. But to me the idea that reality is not fixed but relative to what real-ness prior you decide to pick is extremely ugly. It would mean that the utility of technology to achieve things is merely a shared delusion, that if a theist chose a real-ness prior that assigned high real-ness only to universes where a theistic god existed then he would be correct to pray, etc. Effectively you’re saying that the postmodernists were right after all.
Now, the fact that I have a negative emotional reaction to this proposal doesn’t make it less true, of course.
There is a deep analogy between how you can’t change the laws of physics (contents of reality, apart from lawfully acting) and how you can’t change your own program. It’s not a delusion unless it can be reached by mistake. The theist can’t be right to act as if a deity exists unless his program (brain) is such that it is the correct way to act, and he can’t change his mind for it to become right, because it’s impossible to change one’s program, only act according to it.
The problem is that this point of view means that in a debate with someone who is firmly religious, not only is the religious person right, but you regret the fact that you are “rational”; you lament “if only I had been brought up with religious indoctrination, I would correctly believe that I am going to heaven”.
Any rational theory that leaves you lamenting your own rationality deserves some serious scepticism.
Following the same analogy, you can translate it as “if only the God did in fact exist, …”. The difference doesn’t seem particularly significant—both “what ifs” are equally impossible. “Regretting rationality” is on a different level—rationality in the relevant sense is a matter of choice. The program that defines your decision-making algorithm isn’t.
I still fear that you are reading in my words something very different from what I intend, as I don’t see the possibility of a religious person’s mind actually acting as if God is real. A religious person may have a free-floating network of beliefs about God, but it doesn’t survive under reflection. A true god-impressed mind would actually act as if God is real, no matter what, it won’t be deconvertable, and indeed under reflection an atheist god-impressed mind will correctly discard atheism.
Not all beliefs are equal, a human atheist is correct not just according to atheist’s standard, and a human theist is incorrect not just to atheist’s standard. The standard is in the world, or, under this analogy, in the mind. (The mind is a better place for ontology, because preference is also here, and human mind can be completely formalized, unlike the unknown laws of physics. By the way, the first post is up).
So your argument is that the reason that the theists are wrong is because they only sorta-kinda believe in God anyway, but if they really believed, then they’d be just as right as we are?
But only in the sense that their calculation could be correct according to a particularly weird prior. The difference between normal theist and a “god-impressed mind” who both believe in God is that of rationality: the former makes mistakes in updating beliefs, the latter probably doesn’t. The same with an atheist god-impressed mind and a human atheist. You can’t expect to find that weird a prior in a human. And of course, you should say that the god-impressed are wrong about their beliefs, though they correctly follow the evidence according to their prior. If you value their success in the real world more than the autonomy of their preference, you may want to reach into their minds and make appropriate changes.
I should say again: the program that defines the decision-making algorithm can’t be normally changed, which means that one can’t be really “converted” to a different preference, though one can be converted to different beliefs and feelings. Observations don’t change the algorithm, they are processed according to that algorithm. This means that if you care about reflective consistency (and everyone does, in the sense of preservation of preference), you’d try to counteract the unwanted effects of environment on yourself, including the self-promoting effects where you start liking the new situation. The extent to which you like the new situation, the “level of conviction”, it’s pretty much irrelevant, just as the presence of a losing psychological drive. It’d take great integrity (not “strength of conviction”) in the change for significantly different values to really sink in, in the sense that the new preference-on-reflection will resemble the new beliefs and feelings similarly to how the native preference-on-reflection will resemble native (sane, secular, etc.) beliefs and feelings.
I doubt that you can define a way to choose an algorithm out of a human brain that makes that sentence true.
Yes, that wasn’t careful. In this context, I mean “no large shift of preference”. Tiny changes occur all the time (and are actually very important if you scale them up by giving the preference with/without these changes to a FAI). You can model the extent of reversibility (as compared to a formal computer program) by roughly what can be inferred about the person’s past, which doesn’t necessarily all has to be from the person’s brain. (By an algorithm in human brain I mean all of human brain, basically a program that would run an upload implementation, together with the data.)
I agree that it’s ugly to think of the weights as a pretense on how real certain parts of reality are. That’s why I think it may be better to think of them as representing how much you care about various parts of reality. (For the benefit of other readers, I talked about this in What Are Probabilities, Anyway?.)
Actually, I haven’t completely given up the idea that there is some objective notion of how real, or how important, various parts of reality are. It’s hard to escape the intuition that some parts of math are just easier to reach or find than others, in a way that is not dependent how human minds work.
I believe even personal identity falls under this category. A lot of moral intuitions work with the-me-in-the-future object, as marked in the map. To follow these intuitions, it is very important for us to have a good idea of where the-me-in-the-future is, to have a good map of this thing. When you get to weird thought experiments with copying, this epistemic step breaks down, because if there are multiple future-copies, the-me-in-the-future is a pattern that is absent. As a result, moral intuitions, that indirectly work through this mark on the map, get confused and start giving the wrong answers as well. This can be readily observed for example from preferential inconsistency in time expected in such thought experiments (you precommit to teleporting-with-delay, but then your copy that is to be destroyed starts complaining).
Personal identity is (in general) a wrong epistemic question asked by our moral intuition. Only if preference is expressed in terms of the territory (or rather in a form flexible enough to follow all possible developments), including the parts currently represented in moral intuition in terms of the-me-in-the-future object in the territory, will the confusion with expectations and anthropic thought experiments go away.
thanks, I’ll consider it
Isn’t this one of the problems you can let the FAI solve?
Actually the repugnant conclusion yes, preference aggregation no, because you have to aggregate individual humans’ preferences.
And what if preferences cannot be measured by a common “ruler”? What then?
I agree that preference aggregation is hard. Wei dai and nick Bostrom have both made proposals based upon agents negotiating with some deadline or constraint.
Maybe I’m crazy but all that doesn’t sound so hard.
More precisely, there’s one part, the solution to which should require nothing more than steady hard work, and another part which is so nebulous that even the problems are still fuzzy.
The first part—requiring just steady hard work—is everything that can be reduced to existing physics and mathematics. We’re supposed to take the human brain as input and get a human-friendly AI as output. The human brain is a decision-making system; it’s a genetically encoded decision architecture or decision architecture schema, with the parameters of the schema being set in the individual by genetic or environmental contingencies. CEV is all about answering the question: If a superintelligence appeared in our midst, what would the human race want its decision architecture to be, if we had time enough to think things through and arrive at a stable answer? So it boils down to asking, if you had a number of instances of the specific decision architecture human brain, and they were asked to choose a decision architecture for an entity of arbitrarily high intelligence that was to be introduced into their environment, what would be their asymptotically stable preference? That just doesn’t sound like a mindbogglingly difficult problem. It’s certainly a question that should be answerable for much simpler classes of decision architecture.
So it seems to me that the main challenge is simply to understand what the human decision architecture is. And again, that shouldn’t be beyond us at all. The human genome is completely sequenced, we know the physics of the brain down to nucleons, there’s only a finite number of cell types in the body—yes it’s complicated, but it’s really just a matter of sticking with the problem. (Or would be, if there was no time factor. But how to do all this quickly is a separate problem.)
So to sum up, all we need to do is to solve the decision theory problem ‘if agents X, Y, Z… get to determine the value system and cognitive architecture of a new, superintelligent agent A which will be introduced into their environment, what would their asymptotic preference be?’; correctly identify the human decision architecture; and then substitute this for X, Y, Z… in the preceding problem.
That’s the first part, the ‘easy’ part. What’s the second part, the hard but nebulous part? Everything to do with consciousness, inconceivable future philosophy problems, and so forth. Now what’s peculiar about this situation is that the existence of nebulous hard problems suggests that the thinker is missing something big about the nature of reality, and yet the easy part of the problem seems almost completely specified. How can the easy part appear closed, an exactly specified problem simply awaiting solution, and yet at the same time, other aspects of the overall task seem so beyond understanding? This contradiction is itself something of a nebulous hard problem.
Anyway, achieving the CEV agenda seems to require a combination of steady work on a well-defined problem where we do already have everything we need to solve it, and rumination on nebulous imponderables in the hope of achieving clarity—including clarity about the relationship between the imponderables and the well-defined problem. I think that is very doable—the combination of steady work and contemplation, that is. And the contemplation is itself another form of steady work—steadily thinking about the nebulous problems, until they resolve themselves.
So long as there are still enigmas in the existential equation we can’t be sure of the outcome, but I think we can know, right now, that it’s possible to work on the problem (easy and hard aspects alike) in a systematic and logical way.
Could you clarify for me what you mean by requiring that that a human consciousness be instantiated? Is it that you don’t believe it is possible to elicit a CEV from a human if instantiation is involved or that you object to the consequences of simulating human consciousnesses in potentially undesirable situations?
In the case of the latter I observe that this is only a problem under certain CEVs and so is somewhat different in nature to the other requirements. Some people’s CEVs could then be extracted more easily than others.
Other approaches seem likely to get there first.
...and what have you got against testing?
No reply. Just so you know, the collective position on testing here is bizarre.
How you can think that superintelligent agents are often dangerous AND that a good way of dealing with this is to release an untested one on the world is really quite puzzling.
Hardly anyone ever addresses the issue. When they do, it is by pointing to AI box experiments, which purport to show that a superintelligence can defeat a lesser intelligence, even if well strapped down.
That seems irrelevant to me. To build a jail, for the smartest agent in the world, you do not use vastly less powerful agents as guards, you use slightly less powerful ones. If necesssary, you can dope the restrained agent up a bit. There are in fact all manner of approaches to this problem—I recommend thinking about them some more before discarding the whole idea of testing superintelligences.
No-one’s against testing, that precaution should be taken, but it’s not the most pressing concern at this stage.
See “All of the above working first time, without testing the entire superintelligence”, upthread. This is not the first time.
I am no where near caught up on FAI readings but here are is a humble thought.
What I have read so far seems to be assuming a single jump FAI. That is once the FAI is set it must take us to where we ultimately want to go without further human input. Please correct me if I am wrong.
What about a multistage approach?
The problem that people might immediately bring up is that a multistage approach might lead elevating subgoals to goals. We say, “take us to mastery of nanotech” and the AI decides to rip us apart and organize all existing ribosomes under a coherent command.
However, perhaps what we need to do is verify that any intermediate state goal better than the current state.
So what if we have the AI guess a goal state. Then simulate that goal state and expose some subset of humans to that simulation. The AI the asks “Proceed to this stage or no” The humans answer.
Once in the next stage we can reassess.
To give a sense of motivation: it seems that verifying the goodness of future-state is easier than trying to construct the basic rules of good statedness.
Powerful machine intelligences can be expected to have natural drives to eliminate competing goal-based systems.
So, unless there are safeguards against it, a machine intelligence is likely to assassinate potential designers of other machine intelligences which may have subtly different goals. IMO, assassinating your competitors is not an acceptable business practice.
CEV doesn’t seem to have much in the way of safeguards against this. It isn’t even constrained to follow the law of the land. I think as it stands, it has clear criminal tendencies—and so should not be built.
People aren’t constrained to follow the law of the land either.
Fortunately for the rest of us, most criminals are relatively impotent.
Have you ever heard of the last president of the US? He’s a particularly extreme example in criminality for a president, but I’m pretty sure that all or nearly all presidents would count extremely criminal compared to what you are used to from day-to-day life. Congressman likewise.
Hence the qualifier “most”.
Also, does driving faster than the speed limit make you technically a criminal? How about downloading pirate MP3s?
What sense of ‘criminal’ are you using here? Presumably not ‘convicted of a crime by a court’ since that is relatively rare for politicians. Do you mean ‘have committed acts that are against the law but have not been prosecuted’ or do you mean ‘have committed acts that in my view are/should-be-viewed-as criminal but have not actually broken the law technically’?
He has publicly admitted to ordering violations of the FISA statute, a felony, so certainly “have committed acts that are against the law but have not been prosecuted”.
US politics is not my area—but I don’t think there has ever been a criminal prosecution of an incumbent president.
However, sometimes criminals do get some influence and cause significant damage. It seems like a good reason to do what you can to prevent such things from happening.