That’s why I prefer the ‘would it satisfy everyone who ever lived?’ strategy over CEV. Humanity’s future doesn’t have to be coherent. Coherence is something that happens at evolutionary choke-points, when some species dies back to within an order of magnitude of the minimum sustainable population. When some revolutionary development allows unprecedented surpluses, the more typical response is diversification.
Consider the trilobites. If there had been a trilobite-Friendly AI using CEV, invincible articulated shells would comb carpets of wet muck with the highest nutrient density possible within the laws of physics, across worlds orbiting every star in the sky. If there had been a trilobite-engineered AI going by 100% satisfaction of all historical trilobites, then trilobites would live long, healthy lives in a safe environment of adequate size, and the cambrian explosion (or something like it) would have proceeded without them.
Most people don’t know what they want until you show it to them, and most of what they really want is personal. Food, shelter, maybe a rival tribe that’s competent enough to be interesting but always loses when something’s really at stake. The option of exploring a larger world, seldom exercised. It doesn’t take a whole galaxy’s resources to provide that, even if we’re talking trillions of people.
I realized a pithy way of stating my objection to that strategy: given how unlikely I think it is that the test could be passed fairly by a Friendly AI, an AI passing the test is stronger evidence that the AI is cheating somehow than that the AI is Friendly.
If the AI is programmed so that it genuinely wants to pass the test (or the closest feasible approximation of the test) fairly, cheating isn’t an issue. This isn’t a matter of fast-talking it’s way out of a box. A properly-designed AI would be horrified at the prospect of ‘cheating,’ the way a loving mother is horrified at the prospect of having her child stolen by fairies and replaced with a near-indistinguishable simulacrum made from sticks and snow.
It is probably possible to pass that test by exploiting human psychology.
It is probably impossible to do well on that test by trying to convince humans that your viewpoint is right.
You’re talking past orthonormal. You’re assuming a properly-designed AI. He’s saying that accomplishing the task would be strong evidence of unfriendliness.
Taboo “fairly”— this is another word the specification of which requires the whole of human values. Proving that the AI understands what we mean by fairness and wants to pass the test fairly is no easier than proving it Friendly in the first place.
“Fairly” was the wrong word in this context. Better might be ‘honest’ or ‘truthful.’ A truthful piece of information is one which increases the recipient’s ability to make accurate predictions; an honest speaker is one whose statements contain only truthful information.
Walking from Helsinki to Saigon sounds easy, too, depending on how it’s phrased. Just one foot in front of the other, right?
Humans make predictions all the time. Any time you perceive anything and are less than completely surprised by it, that’s because you made a prediction which was at least partly successful. If, after receiving and assimilating the information in question, any of your predictions is reduced in accuracy, any part of that map becomes less closely aligned with the territory, then the information was not perfectly honest. If you ignore or misinterpret it for whatever reason, even when it’s in some higher sense objectively accurate, that still fails the honesty test.
A rationalist should win; an honest communicator should make the audience understand.
Given the option, I’d take personal survival even at the cost of accurate perception and ability to act, but it’s not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.
What Robin is saying is, there’s a difference between
“metrics that correlate well enough with what you really want that you can make them the subject of contracts with other human beings”, and
“metrics that correlate well enough with what you really want that you can make them the subject of a transhuman intelligence’s goals”.
There are creative avenues of fulfilling the letter without fulfilling the spirit that would never occur to you but would almost certainly occur to a superintelligence, not because xe is malicious, but because they’re the optimal way to achieve the explicit goal set for xer. Your optimism, your belief that you can easily specify a goal (in computer code, not even English words) which admits of no undesirable creative shortcuts, is grossly misplaced once you bring smarter-than-human agents into the discussion. You cannot patch this problem; it has to be rigorously solved, or your AI wrecks the world.
Given the option, I’d take personal survival even at the cost of accurate perception and ability to act, but it’s not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.
Sure, but I don’t want to be locked in a box watching a light blink very predictably on and off.
Building the box reduces your ability to predict anything taking place outside the box. Even if the box can be sealed perfectly until the end of time without killing you (which would in itself be a surprise to anyone who knows thermodynamics), cutting off access to compilations of medical research reduces your ability to predict your own physiological reactions. Same goes for screwing with your brain functions.
I do not think you should be as confident as you are that your system is bulletproof. You have already had to elaborate and clarify and correct numerous times to rule out various kinds of paperclipping failures—all it takes is one elaboration or clarification or correction forgotten to allow for a new one, attacking the problem this way.
That’s our problem right there: you’re trying to persuade me to abandon a position I don’t actually hold. I agree that an AI based strictly on a survey of all historical humans would have negligible chance of success, simply because a literal survey is infeasible and any straightforward approximation of it would introduce unacceptable errors.
For everyone else, it was a chance to identify flaws in a proposition. No such thing as too much practice there. For me, it was a chance to experience firsthand the thought processes involved in defending a flawed proposition, necessary practice for recognizing other such flawed beliefs I might be holding; I had no religious upbringing to escape, so that common reference point is missing.
Furthermore, I knew from the outset that such a survey wouldn’t be practical, but I’ve been suspicious of CEV for a while now. It seems like it would be too hard to formalize, and at the same time, even if successful, too far removed from what people spend most of their time caring about. I couldn’t be satisfied that there wasn’t a better way to do it until I’d tried to find such a way myself.
It’s polite to give some signal that you’re playing devil’s advocate if you know you’re making weak arguments.
I couldn’t be satisfied that there wasn’t a better way to do it until I’d tried to find such a way myself.
This is not a sufficient condition for establishing the optimality of CEV. Indeed, I’m not sure there isn’t a better way (nor even that CEV is workable), just that I have at present no candidates for one.
I apologize. I thought I had discharged the devil’s-advocacy-signaling obligation by ending my original post on the subject with a request to be proved wrong.
I agree that personal satisfaction with CEV isn’t a sufficient condition for it being safe. For that matter, having proposed and briefly defended this one alternative isn’t really sufficient for my personal satisfaction in either CEV’s adequacy or the lack of a better option. But we have to start somewhere, and if someone did come up with a better alternative to CEV, I’d want to make sure that it got fair consideration.
Your trilobite example is at odds with your everyone-who-lived strategy. The impact of the trilobite example is to show that CEV is fundamentally wrong, because trilobite cognition, no matter how far you extrapolate it, would never lead to love, or value it if it arose by chance.
Some degree of randomness is necessary to allow exploration of the landscape of possible worlds. CEV is designed to prevent exploration of that landscape.
Some degree of randomness is necessary to allow exploration of the landscape of possible worlds. CEV is designed to prevent exploration of that landscape.
You have not yet learned that a certain argumentative strategy against CEV is doomed to self-referential failure. You have just argued that “exploring the landscape of possible worlds” is a good thing, something that you value. I agree, and I think it’s a reflectively consistent value, which others generally share at some level and which they might share more completely if they knew more, thought faster, had grown up farther together, etc.
You then assume, without justification, that “exploring the landscape of possible worlds” will not be expressed as a part of CEV, and criticize it on these grounds.
Huh? What friggin’ definition of CEV are you using?!?
EDIT: I realized there was an insult in my original formulation. I apologize for being a dick on the Internet.
You then assume, without justification, that “exploring the landscape of possible worlds” will not be expressed as a part of CEV, and criticize it on these grounds.
Because EY has specifically said that that must be avoided, when he describes evolution as something dangerous. I don’t think there’s any coherent way of saying both that CEV will constrain future development (which is its purpose), and that it will not prevent us from reaching some of the best optimums.
Most likely, all the best optimums lie in places that CEV is designed to keep us away from, just as trilobite CEV would keep us away from human values. So CEV is worse than random.
Most likely, all the best optimums lie in places that CEV is designed to keep us away from, just as trilobite CEV would keep us away from human values.
That a “trilobite CEV” would never lead to human values is hardly a criticism of CEV’s effectiveness. The world we have now is not “trilobite friendly”; trilobites are extinct!
CEV, as I understand it, is very weakly specified. All it says is that a developing seed AI chooses its value system after somehow taking into account what everyone would wish for, if they had a lot more time, knowledge, and cognitive power than they do have. It doesn’t necessarily mean, for example, that every human being alive is simulated, given superintelligence, and made to debate the future of the cosmos in a virtual parliament. The combination of better knowledge of reality and better knowledge of how the human mind actually works may make it extremely clear that the essence of human values, extrapolated, is XYZ, without any need for a virtual referendum, or even a single human simulation.
It is a mistake to suppose, for example, that a human-based CEV process will necessarily give rise to a civilizational value system which attaches intrinsic value to such complexities as food, sex, or sleep, and which will therefore be prejudiced against modes of being which involve none of these things. You can have a value system which attributes positive value to human beings getting those things, not because they are regarded as intrinsically good, but because entities getting what they like is regarded as intrinsically good.
If a human being is capable of proposing a value system which makes no explicit mention of human particularities at all (e.g. Ben Goertzel’s “growth, choice, and joy”), then so is the CEV process. So if the worry is that the future will be kept unnecessarily anthropomorphic, that is not a valid critique. (It might happen if something goes wrong, but we’re talking about the basic idea here, not the ways we might screw it up.)
You could say, even a non-anthropomorphic CEV might keep us away from “the best optimums”. But let’s consider what that would mean. The proposition would be that even in a civilization making the best, wisest, most informed, most open-minded choices it could make, it still might fall short of the best possible worlds. For that to be true, must it not be the case that those best possible worlds are extremely hard to “find”? And if you propose to find them by just being random, must there not be some risk of instead ending up in very bad futures? This criticism may be comparable to the criticism that rational investment is a bad idea, because you’d make much more money if you won the lottery. If these distant optima are so hard to find, even when you’re trying to find good outcomes, I don’t see how luck can be relied upon to get you there.
This issue of randomness is not absolute. One might expect a civilization with an agreed-upon value system to nonetheless conduct fundamental experiments from time to time. But if there were experiments whose outcomes might be dangerous as well as rewarding, it would be very foolish to just go ahead and do them because if we get lucky, the consequences would be good. Therefore, I do not think that unconstrained evolution can be favored over the outcomes of non-anthropomorphic CEV.
Because EY has specifically said that that must be avoided, when he describes evolution as something dangerous.
That doesn’t mean that you can’t examine possible trajectories of evolution for good things you wouldn’t have thought of yourself, just that you shouldn’t allow evolution to determine the actual future.
I don’t think there’s any coherent way of saying both that CEV will constrain future development (which is its purpose), and that it will not prevent us from reaching some of the best optimums.
I’m not sure what you mean by “constrain” here. A process that reliably reaches an optimum (I’m not saying CEV is such a process) constrains future development to reach an optimum. Any nontrivial (and non-self-undermining, I suppose; one could value the nonexistence of optimization processes or something) value system, whether “provincially human” or not, prefers the world to be constrained into more valuable states.
Most likely, all the best optimums lie in places that CEV is designed to keep us away from
I don’t see where you’ve responded to the point that CEV would incorporate whatever reasoning leads you to be concerned about this.
It seems that you think there are two tiers of values, one consisting of provincial human values, and another consisting of the true universal values like “exploring the landscape of possible worlds”. You worry that CEV will catch only the first group of values.
From where I stand, this is just a mistaken question; the values you worry will be lost are provincial human values too! There’s no dividing line to miss.
I understand what you’re saying, and I’ve heard that answer before, repeatedly; and I don’t buy it.
Suppose we were arguing about the theory of evolution in the 19th century, and I said, “Look, this theory just doesn’t work, because our calculations indicate that selection doesn’t have the power necessary.” That was the state of things around the turn of the century, when genetic inheritance was assumed to be analog rather than discrete.
An acceptable answer would be to discover that genes were discrete things that an organism had just 2 copies of, and that one was often dominant, so that the equations did in fact show that selection had the necessary power.
An unacceptable answer would be to say, “What definition of evolution are you using? Evolution makes organisms evolve! If what you’re talking about doesn’t lead to more complex organisms, then it isn’t evolution.”
Just saying “Organisms become more complex over time” is not a theory of evolution. It’s more like an observation of evolution. A theory means you provide a mechanism and argue convincingly that it works. To get to a theory of CEV, you need to define what it’s supposed to accomplish, propose a mechanism, and show that the mechanism might accomplish the purpose.
You don’t have to get very far into this analysis to see why the answer you’ve given doesn’t, IMHO, work. I’ll try to post something later this afternoon on why.
I won’t get around to posting that today, but I’ll just add that I know that the intent of CEV is to solve the problems I’m complaining about. I know there are bullet points in the CEV document that say, “Renormalizing the dynamic”, “Caring about volition,” and, “Avoid hijacking the destiny of humankind.”
But I also know that the CEV document says,
Since the output of the CEV is one of the major forces shaping the future, I’m still pondering the order-of-evaluation problem to prevent this from becoming an infinite recursion.
and
It may be hard to get CEV right—come up with an AI dynamic such that our volition, as defined, is what we intuitively want. The technical challenge may be too hard; the problems I’m still working out may be impossible or ill-defined. I don’t intend to trust any design until I see that it works, and only to the extent I see that it works. Intentions are not always realized.
I think there is what you could call an order-of-execution problem, and I think there’s a problem with things being ill-defined, and I think the desired outcome is logically impossible. I could be wrong. But since Eliezer worries that this could be the case, I find it strange that Eliezer’s bulldogs are so sure that there are no such problems, and so quick to shoot down discussion of them.
This is one of the things I don’t understand: If you think everything is just a provincial human value, then why do you care? Why not play video games or watch YouTube videos instead of arguing about CEV? Is it just more fun?
(There’s a longish section trying to answer this question in the CEV document, but I can’t make sense of it.)
There’s a distinction that hasn’t been made on LW yet, between personal values and evangelical values. Western thought traditionally blurs the distinction between them, and assumes that, if you have personal values, you value other people having your values, and must go on a crusade to get everybody else to adopt your personal values.
The CEVer position is, as far as I can tell, that they follow their values because that’s what they are programmed to do. It’s a weird sort of double-think that can only arise when you act on the supposition that you have no free will with which to act. They’re talking themselves into being evangelists for values that they don’t really believe in. It’s like taking the ability to follow a moral code that you know has no outside justification from Nietzsche’s “master morality”, and combining it with the prohibition against value-creation from his “slave morality”.
There’s a distinction that hasn’t been made on LW yet, between personal values and evangelical values. Western thought traditionally blurs the distinction between them, and assumes that, if you have personal values, you value other people having your values, and must go on a crusade to get everybody else to adopt your personal values.
That’s how most values work. In general, I value human life. If someone does not share this value, and they decide to commit murder, then I would stop them if possible. If someone does not share this value, but is merely apathetic about murder rather than a potential murderer themselves, then I would cause them to share this value if possible, so there will be more people to help me stop actual murderers. So yes, at least in this case, I would act to get other people to adopt my values, or inhibit them from acting on their own values. Is this overly evangelical? What is bad about it?
In any case, history seems to indicate that “evangelizing your values” is a “universal human value”.
Groups that didn’t/don’t value evangelizing their values:
The Romans. They don’t care what you think; they just want you to pay your taxes.
The Jews. Because God didn’t choose you.
Nietzschians. Those are their values, dammit! Create your own!
Goths. (Angst-goths, not Visi-goths.) Because if everyone were a goth, they’d be just like everyone else.
We get into one sort of confusion by using particular values as examples. You talk about valuing human life. How about valuing the taste of avocados? Do you want to evangelize that? That’s kind of evangelism-neutral. How about the preferences you have that make one particular private place, or one particular person, or other limited resource, special to you? You don’t want to evangelize those preferences, or you’d have more competition. Is the first sort of value the only one CEV works with? How does it make that distinction?
We get into another sort of confusion by not distinguishing between the values we hold as individuals, the values we encourage our society to hold, and the values we want God to hold. The kind of values you want your God to hold are very different from the kind of values you want people to hold, in the same way that you want the referee to have different desires than the players. CEV mushes these two very different things together.
Good points. I haven’t thoroughly read the CEV document yet, so I don’t know if there is any discussion of this, but it does seem that it should make a distinction between those different types of values and preferences.
Some degree of randomness is necessary to allow exploration of the landscape of possible worlds. CEV is designed to prevent exploration of that landscape.
Folks. Vladimir’s response is not acceptable in a rational debate. The fact that it currently has 3 points is an indictment of the Less Wrong community.
That post is about a different issue. It’s about whether introducing noise can help an optimization algorithm. Sounds similar; isn’t. The difference is that the optimization algorithm already knows the function that it’s trying to optimize.
The basic problem with CEV is that it requires reifying values in a strange way so that there are atomic “values” that can be isolated from an agent’s physical and cognitive architecture; and that (I think) it assumes that we have already evolved to the point where we have discovered all of these values. You can make very general value statements, such as that you value diversity, or complexity. But a trilobite can’t make any of those value statements. I think it’s likely that there are even more important fundamental value statements to be made that we have not yet conceptualized; and CEV is designed from the ground up specifically to prevent such new values from being incorporated into the utility function.
The need for randomness is not because random is good; it’s because, for the purpose of discovering better primitives (values) to create better utility functions, any utility function you can currently state is necessarily worse than random.
Since when is randomness required to explore the “landscape of possible worlds”? Or the possible values that we haven’t considered? A methodical search would be better. How did you miss that lesson from Worse Than Random, when it included an example (the pushbutton combination lock) of exploring a space of potential solutions?
Okay, you don’t actually need randomness, if you can work out a way of doing a methodical variation of all possible parameters.
(For problems of this nature, using random processes allows you to specify the statistical properties that you want the solution to have, which is often much simpler than specifying a deterministic process that has those properties. That’s one reason randomness is useful.)
The point I’m trying to make is that you need not to limit yourself to “searching”, meaning trying to optimize a function. You can only search when you know what you’re looking for. A value system can’t be evaluated from the outside. You have to try it on. Rationally, where “rational” means optimizing existing values, you wouldn’t do that. So randomness (or a rationally-ordered but irrationally-pursued exploration of parameter space) will lead to places no rational agent would go.
[EDIT: Wow, the parent comment completely changed since I responded to it. WTF?]
I have a bad habit of re-editing a comment for several minutes after first posting it.
How do you plan to map a random number into a search a space that you could not explore systematically?
Suppose you want to test a program whose input variables are distributed normally. You can write a big complicated equation to sample at uniform intervals from the cumulative distribution function for the gaussian distribution. Or you can say “x = mean; for i=1 to 10 { x += rnd(2)-1 }”.
Very often, the only data you know about your space is randomly-sampled data. So you look at that randomly-sampled data, and come up with some simple random model that would generate data with similar properties. The nature of the statistics you’ve gathered, such as the mean, variance, and correlations between observed variables, make it very hard to construct a deterministic model that would reproduce those statistics, but very easy to build a random model that does.
Some people really do have the kinds of misconceptions Eliezer was talking about; but the idea that there are hordes of scientists who attribute magical properties to randomness just isn’t true. This is not a fight you need to fight. And railing against all use of randomness in the simulation or study of complex processes just puts a big sticker on your head that says “I have no experience with what I’m talking about!”
We’re having 2 separate arguments here. I hope you realize that my comment that you originally responded to was not claiming that randomness has some magical power. It was about the need, when considering the future of the universe, for trying things out not just because your current utility function suggests they will have high utility. I used “random” as shorthand for “not directed by a utility function”.
According to which utility function?
According to the utility function that your current utility function doesn’t like, but that you will be delighted with once you try it out.
Suppose you want to test a program whose input variables are distributed normally. You can write a big complicated equation to sample at uniform intervals from the cumulative distribution function for the gaussian distribution. Or you can say “x = mean; for i=1 to 10 { x += rnd(2)-1 }”.
Yes, I understand you can use randomness as an approximate substitute for actually understanding the implications of your probability distributions. That does not really address my point, the randomness does not grant you access to a search space you could not otherwise explore.
Very often, the only data you know about your space is randomly-sampled data. So you look at that randomly-sampled data, and come up with some simple random model that would generate data with similar properties.
If you analyze randomly-sampled data by considering the probability distribution of results for a random sampling, instead for the specific sampling you actually used, you are vulnerable to the mistake described here.
The nature of the statistics you’ve gathered, such as the mean, variance, and correlations between observed variables, make it very hard to construct a deterministic model that would reproduce those statistics, but very easy to build a random model that does.
You can deterministically build a model that accounts for your uncertainty. Having a probability distribution is not the same thing as randomly choosing results from that distribution.
And railing against all use of randomness in the simulation or study of complex processes just puts a big sticker on your head that says “I have no experience with what I’m talking about!”
First of all, I am not “railing against all use of randomness in the simulation or study of complex processes”. I am objecting to your claim that “randomness is required” in an epistemilogical process. Second, you should not presume to warn me about stickers on my head.
I hope you realize that my comment that you originally responded to was not claiming that randomness has some magical power.
You should realize that “randomness is required” does sound very much like “claiming that randomness has some magical power”, and if you mispoke, the correct response to the objection would be to admit that you made a mistake and apologize for the miscommunication, not to try to defend the wrong claim.
According to which utility function?
According to the utility function that your current utility function doesn’t like, but that you will be delighted with once you try it out.
It appears that you don’t understand the purpose of utility functions. I do not want to have a utility function U that maximizes U(U), that assigns to itself higher utility than any other utility function assigns to itself. I want to achieve states of the world that maximize my current utility function.
You should realize that “randomness is required” does sound very much like “claiming that randomness has some magical power”, and if you mispoke, the correct response to the objection would be to admit that you made a mistake and apologize for the miscommunication, not to try to defend the wrong claim.
You mean, for instance, by saying,
Okay, you don’t actually need randomness, if you can work out a way of doing a methodical variation of all possible parameters.
I’m not defending the previous wrong claim about “needing randomness”. I’m arguing against your wrong claim, which appears to be that one should never use randomness in your models.
It appears that you don’t understand the purpose of utility functions. I do not want to have a utility function U that maximizes U(U), that assigns to itself higher utility than any other utility function assigns to itself. I want to achieve states of the world that maximize my current utility function.
It appears that you still don’t understand what my basic point is. You can’t improve your utility function by a search using your utility function. We have better utility functions than trilobites did. We could not have found them using trilobite utility functions. Trilobite CEV would, if performing optimally, have ruled them out. Extrapolate.
Okay, you don’t actually need randomness, if you can work out a way of doing a methodical variation of all possible parameters.
Wow, you are actually compounding the rudeness of abusing the edit feature to completely rewrite your comment by then analyzing my response to the original version as if it were responding to the edited version.
I’m arguing against your wrong claim, which appears to be that one should never use randomness in your models.
How did you get from “randomness is never required” to “randomness is never useful”? I acknowledge that sometimes randomness can be a good enough approximate substitute for the much harder strategy of actually understanding the implications of a probability distribution.
It appears that you still don’t understand what the argument we’re having is about.
I understand your argument. It is wrong. You have not actually responded to my objection. To refute my objection, you would have to explain why I should want to give up my current utility function U0 in favor of some other utility function U such that
(1) U(U) > U0(U0)
even though
(2) U0(U0) > U0(U)
Since U0 is my current utility function, and therefore (2) describes my current wants, you will not be able to convince me that I should be persuaded by (1), which is a meaningless comparison. Adopting U as my utility function does not help me maximize U0.
To the extent that trilobites can even be considered to have utility functions, my utility function is better than the trilobite utility function according to my values. The trilobites would disagree. An optimal human CEV would be a human SUCCESS and a trilobite FAIL. Likewise, an optimal trilobite CEV would be a trilobite SUCCESS and a human FAIL. There is no absolute universal utilility function that says one of these is better than the others. It is my human values that cause me to say that the human SUCCESS is better.
An optimal human CEV would be a human SUCCESS and a trilobite FAIL.
Unless, of course, it turns out that humans really like trilobites and would be willing to devote significant resources to keeping them alive, understanding their preferences, and carrying out those preferences (without compromising other human values). In that case, it’s mutual success.
I’m breaking this out into a separate reply, because it’s its own sub-thread:
If no utility function, and hence no world state, is objectively better than any other, then all utility functions are wireheading. Because the only distinction between wireheading, and not wireheading, is that the wirehead only cares about his/her own qualia, not about states of the world. If the only reason you care about states of the world is because of how your utility function evaluates them—that is to say, what qualia they generate in you—you are a wirehead.
If the only reason you care about states of the world is because of how your utility function evaluates them—that is to say, what qualia they generate in you—you are a wirehead.
You have it backwards. I do not care about things because of how my utility function evaluates them. Rather, my utility function evaluates things the way it does because of how I care about it. My utility function is a description of my preferences, not the source of them.
I don’t think the order of execution matters here. If there’s no objective preference over states of the world, then there’s no objective reason to prefer “not wireheading” (caring about states of the world) over “wireheading” (caring only about your percepts).
There is no “objective” reason to do anything. Knowing that, what are you going to do anyways? Myself, I am still going to things for my subjective reasons.
You appear to have an overexpansive definition of wireheading. Having an arbitrary utility function is not the same as wireheading. Wireheading is a very specific sort of alteration of utility functions that we (i.e. most humans, with our current, subjective utility functions, nearly universally) see as very dangerous, because it throws away what we currently care about. Wireheading is a “parochial” definition, not universal. But that’s OK.
What else can the utility function as implemented by your hardware depend on besides your qualia, and computations derived from your qualia?
Calling utility functions “wireheading” is a category error. Wireheading is either:
Directly acting on the machinery that implements one’s utility function to trivially satisfy this hardware, i.e. by directly injecting qualia rather than providing the qualia via what they are normally correlated with.
More broadly, altering one’s utility function to one that is trivial to broadly satisfy, such as by reinforcement via 1.
Calling utility functions “wireheading” is a category error.
If you read my original comment, it’s clear that I meant wireheading is having a utility function that depends only on your qualia. Or maybe “choosing to have”.
What else can the utility function as implemented by your hardware depend on besides your qualia, and computations derived from your qualia?
Huh? So you think there’s nothing inside your head except qualia?
Beliefs aren’t qualia. Subconscious information isn’t qualia.
Directly acting on the machinery that implements one’s utility function to trivially satisfy this hardware, i.e. by directly injecting qualia rather than providing the qualia via what they are normally correlated with.
This sounds like a potentially good definition. But I’m unclear then why anyone using utility theory, and that definition, would object to wireheading. If you’ve got a utility function, and you can satisfy it, that’s the thing to do, right? Why does it matter how you satisfy it? You seem to be saying that the hardware implementation isn’t your real utility function, it’s just an implementation of it. As if the utility function stood somewhere outside you.
Huh? So you think there’s nothing inside your head except qualia?
Beliefs aren’t qualia. Subconscious information isn’t qualia.
Beliefs and subconcious information are derived from qualia and the information about the external world that they correlate with, no?
Utility functions are a convenient mathematical description to describe preferences of entities in game theory and some decision theories, when these preferences are consistent. It’s useful as a metaphor for “what we want”, but when used loosely like this, there are troubles.
As applied to humans, this flat-out doesn’t work. Empirically and as a general rule, we’re not consistent, and most of us can readily be money-pumped. We do not have a nice clean module that weighs outcomes and assigns real numbers to them. Nor do we feed outcome weights into a probability weighting module, and then choose the maximum utility. Our values change on reflection. Heck, we’re not even unitary entities. Our consciousness is multi-faceted. There are the left and right brains communicating and negotiating through the corpus callosum. The information immediately accessible to the consciousness, what we identify with, is rather different than the information our subconscious uses. We are a gigantic hack of an intelligence built upon the shifting sands of stimulus-response and reinforcement conditioning. These joints in our selves make it easier to wirehead, and essentially kill our current selves, leaving only animal-level instincts, if that.
But I’m unclear then why anyone using utility theory, and that definition, would object to wireheading. If you’ve got a utility function, and you can satisfy it, that’s the thing to do, right? Why does it matter how you satisfy it? You seem to be saying that the hardware implementation isn’t your real utility function, it’s just an implementation of it. As if the utility function stood somewhere outside you.
There are multiple utility functions running around here. The basic point was that what I consider important now matters to what choices I make now. The fact that I can make the future me have a new utility function, satisfied by wireheading, does not register positively on my current utility function. In fact, because it throws away almost everything I now care about, I am unlikely to do it now. My goals are “satisfy my current utility function”, and are always that, because that’s what we mean by the abstraction of utility function. My goals are not to satisfy what preferences I may later have. My goals are not to change my preferences to be easier to satisfy, because that means my current goals are less likely to be satisfied. If my goals change, than they will have changed, and only then will I choose differently. It’s not that my utility function stands outside of me: my utility function is part of me. Changing it changes me. It so happens that my utility function would be easily changed if I started directly stimulating my reward center. The reward center is not my utility function, though it is part of the implementation of my decision function (which if it were coherent, could be summarized in a utility function, and sets of probabilities). If we wish to identify the reward circuitry of my brain with a utility function, we’ve also got to put a few other utility functions in, and entities having these utility functions that are in a non-zero sum game with the reward circuitry.
Beliefs and subconcious information are derived from qualia and the information about the external world that they correlate with, no?
Not as far as I know, no. You may be equating “qualia” with “percepts”. That’s not right.
The fact that I can make the future me have a new utility function, satisfied by wireheading, does not register positively on my current utility function. In fact, because it throws away almost everything I now care about, I am unlikely to do it now. My goals are “satisfy my current utility function”, and are always that, because that’s what we mean by the abstraction of utility function. My goals are not to satisfy what preferences I may later have.
If that analysis were correct, there would be no difficulty about wireheading. It would simply be an error.
There is a difficulty about wireheading, and I’m trying to talk about it. I’m looking at static situations: Is there something objectively wrong with a person plugged into themselves giving themselves orgasms forever?
The LW community has a consensus that there is something wrong with that. Yet they also have a consensus that there are no objective values. These are inconsistent.
You’re trying to say that wireheading is an error not because the final wirehead state reached is wrong, but because the path from here to there involved an error. That’s not a valid objection, for the reasons you gave in your comment: Humans are messy, and random variation is a natural part of the human hardware and software. And humans have been messy for some time. So if you can become a wirehead by a simple error, many people must already have made that error. And CEV has to incorporate their wirehead preferences equally with everyone else’s.
There’s something inconsistent about saying that human values are good, but the process generating those values is bad.
Not as far as I know, no. You may be equating “qualia” with “percepts”. That’s not right.
Well, I’m still not convinced there is a useful difference, though I see why philosophers would separate the concepts.
There is a difficulty about wireheading, and I’m trying to talk about it. I’m looking at static situations: Is there something objectively wrong with a person plugged into themselves giving themselves orgasms forever?
There is nothing objectively wrong with that, no.
The LW community has a consensus that there is something wrong with that. Yet they also have a consensus that there are no objective values. These are inconsistent.
The LW community has a consensus that there is something wrong with that judged by our current parochial values that we want to maintain. Not objectively wrong, but widely held inter-subjective agreement that lets us cooperate in trying to steer the future away from a course where everyone gets wireheaded.
You’re trying to say that wireheading is an error not because the final wirehead state reached is wrong,
No, I’m saying that the final state is wrong according to my current values. That’s what I mean by wrong: against my current values. Because it is wrong, any path reaching it must have an error in it somewhere.
And humans have been messy for some time. So if you can become a wirehead by a simple error, many people must already have made that error.
We haven’t had the technology to truly wirehead until quite recently, though various addictions can be approximations.
many people must already have made that error. And CEV has to incorporate their wirehead preferences equally with everyone else’s.
Currently, there’s not enough wireheads, or addicts for that matter, to make much of a difference. Those that are wireheads want nothing more than to be wireheads, so I’m not sure that they would effect anything else under CEV. That’s one of the horrors of wireheading—all other values become lost. What we would have to worry about is a proselytizing wirehead, who wishes everyone else would convert. That seems an even harder end-state to reach than a simple wirehead.
Personally, I don’t want CEV applied to the whole human race. I think large swathes of the human race hold values that conflict badly with mine, and still would after perfect reflection. Wireheads would just be a small subset of that.
Personally, I don’t want CEV applied to the whole human race. I think large swathes of the human race hold values that conflict badly with mine, and still would after perfect reflection. Wireheads would just be a small subset of that.
One of my intuitions about about human value is that it is highly diverse, and any extrapolation will be unable to find consensus / coherence in the way desired by CEV. As such, I’ve always thought that the most likely outcome of augmenting human value through the means of successful FAI would be highly diverse subpopulations all continuing to diverge, with a sort of evolutionary pressure for who receives the most resources. Wireheads should be easy to contain under such a scenario, and would leave expansion to the more active groups.
We haven’t had the technology to truly wirehead until quite recently, though various addictions can be approximations.
I was reverting to my meaning of “wireheading”. Sorry about that.
Personally, I don’t want CEV applied to the whole human race. I think large swathes of the human race hold values that conflict badly with mine, and still would after perfect reflection. Wireheads would just be a small subset of that.
We agree on that.
I think one problem with CEV is that, to buy into CEV, you have to buy into this idea you’re pushing that values are completely subjective. This brings up the question of why anyone implementing CEV would want to include anybody else in the subset whose values are being extrapolated. That would be an error.
You could argue that it’s purely pragmatic—the CEVer needs to compromise with the rest of the world to avoid being crushed like a bug. But, hey, the CEVer has an AI on its side.
You could argue that the CEVer’s values include wanting to make other people happy, and believes it can do this by incorporating their values. There are 2 problems with this:
They would be sacrificing a near-infinite expected utility from propagating their values over all time and space, for a relatively infinitessimal one-time gain of happiness on the part of those currently alive here on Earth. So these have to be CEVers with high discounting of the future. Which makes me wonder why they’re interested in CEV.
Choosing the subset of people who manage to develop a friendly AI and set up CEV strongly selects for people who have the perpetuation of values as their dominant value. If someone claims that he will incorporate other peoples’ values in his CEV at the expense of perpetuating his own values because he’s a nice guy, you should expect that he has to date put more effort into being a nice guy than into CEV.
If you’ve got a utility function, and you can satisfy it, that’s the thing to do, right? Why does it matter how you satisfy it? You seem to be saying that the hardware implementation isn’t your real utility function, it’s just an implementation of it. As if the utility function stood somewhere outside you.
I think I see your point: a wireheading utility function would value (1) for providing the reward with less effort, while a nonwireheading utility function would disvalue (1) for providing the reward without the desideratum.
If you think that the notion of “qualia” requires them to be causally isolated from the universe (which is my guess at why you even bring the idea up), then the burden is on you to explain why everyone who discusses consciousness except Daniel Dennett is silly.
In that case, nothing can be said to depend only on the qualia, because anything that depends on them is also indirectly influenced by whatever the qualia themselves depend on.
Are there any independent variables in the real world? Variables are “independent” given a particular analysis.
When you say a function depends only on a set of variables, you mean that you can compute the function given the value of those variables. It doesn’t matter whether those variables are dependent on other variables.
Wow, you are actually compounding the rudeness of abusing the edit feature to completely rewrite your comment by then analyzing my response to the original version as if it were responding to the edited version.
No. That statement is three comments above the comment in which you said I should acknowledge my error. It was already there when you wrote that comment. And I also acknowledged my misstatement in the comment you were replying to, and elaborated on what I had meant when I made the comment.
I acknowledge that sometimes randomness can be a good enough approximate substitute for the much harder strategy of actually understanding the implications of a probability distribution.
Good! We agree.
Since U0 is my current utility function, and therefore (2) describes my current wants, you will not be able to convince me that I should be persuaded by (1), which is a meaningless comparison. Adopting U as my utility function does not help me maximize U0.
Good! We agree again.
To the extent that trilobites can even be considered to have utility functions, my utility function is better than the trilobite utility function according to my values. The trilobites would disagree.
And we agree yet again!
Likewise, an optimal trilobite CEV would be a trilobite SUCCESS and a human FAIL. There is no absolute universal utilility function that says one of these is better than the others. It is my human values that cause me to say that the human SUCCESS is better.
And here is where we part ways.
Maybe there is no universal utility function. That’s a… I won’t say it’s a reasonable position, but I understand its appeal. I would call it an over-reasoned position, like when a philosopher announces that he has proved that he doesn’t exist. It’s time to go back to the drawing board when you come up with that conclusion. Or at least to take your own advice, and stop trying to change the world when you’ve already said it doesn’t matter how it changes.
But to believe that your utility function is nothing special, and still try to take over the universe and force your utility function on it for all time, is insane.
(Yes, yes, I know Eliezer has all sorts of disclaimers in the CEV document about how CEV should not try to take over the universe. I don’t believe that it’s logically possible; and I believe that his discussions of Friendly AI make it even clearer that his plans require complete control. Perhaps the theory is still vague enough that just maybe there’s a way around this; but I believe the burden of proof is on those who say there is a way around it.)
It would be consistent with the theory of utility functions if, in promoting CEV, you were acting on an inner drive that said, “Ooh, baby, I’m ensuring the survival of my utility function. Oh, God, yes! Yes! YES!” But that’s not what I see. I see people scribbling equations, studying the answers, and saying, “Hmm, it appears that my utility function is directing me to propagate itself. Oh, dear, I suppose I must, then.”
That’s just faking your utility function.
I think it’s key that the people I’m speaking of who believe utility functions are arbitrary, also believe they have no free will. And it’s probably also key that they assume their utility function must assign value to its own reproduction. They then use these two beliefs as an excuse to justify not following through on their belief about the arbitrariness of their utility function, because they think to do so would be logically impossible. “We can’t help ourselves! Our utility functions made us do it!” I don’t have a clean analysis, but there’s something circular, something wrong with this picture.
No. That statement is three comments above the comment in which you said I should acknowledge my error.
Let’s recap. You made a wrong claim. I responded to the wrong claim. You disputed my response. I refuted your disputation. You attempted to defend your claim. I responded to your defense. You edited your defense by replacing it with the acknowledgment of your mistake. You responded to my response still sort of defending your wrong claim, and attacking me for refuting your wrong claim. I defended my refutation, pointing out the you really did make the wrong claim and continued to defend it. And now you attack my defense, claiming that you did in fact acknowledge your mistake, and this should somehow negate your continued defense after the acknowledgement. Do you see how you are wrong here? When you acknowledge your claim is wrong, you should not at the same time criticize me for refuting your point.
But to believe that your utility function is nothing special, and still try to take over the universe and force your utility function on it for all time, is insane.
I do believe my utility function is special. I don’t expect the universe (outside of me, my fellow humans, and any optimizing processes we spawn off) to agree with me. But, like Eliezer says, “We’ll see which one of us is still standing when this is over.”
Let’s recap. You made a wrong claim. I responded to the wrong claim. You disputed my response. I refuted your disputation. You attempted to defend your claim. I responded to your defense. You edited your defense by replacing it with the acknowledgment of your mistake.
No, that isn’t what happened. I’m not sure which comment the last sentence is supposed to refer to, but I’m p > .8 it didn’t happen that way. If it’s referring to the statement, “Okay, you don’t actually need randomness,” I wrote that before I ever saw your first response to that comment. But that doesn’t match up with what you just described; there weren’t that many exchanges before that comment. It also doesn’t match up with anything after that comment, since I still don’t acknowledge any such mistake made after that comment.
When you acknowledge your claim is wrong, you should not at the same time criticize me for refuting your point.
We’re talking about 2 separate claims. The wrong claim that I made was in an early statement where I said that you “needed randomness” to explore the space of possible utility functions. The right claim that I made, at length, was that randomness is a useful tool. You are conflating my defense of that claim, with defending the initial wrong claim. You’ve also said that you agree that randomness is a useful tool, which suggests that what is happening is that you made a whole series of comments that I say were attacking claim 2, and that you believe were attacking claim 1.
I’m not planning to tile the universe with myself, I just want myself or something closely isomorphic to me to continue to exist. The two most obvious ways to ensure my own continued existence are avoidance of things that would destroy me, particularly intelligent agents which could devote significant resources to destroying me personally, and making redundant copies. My own ability to copy myself is limited, and an imperfect copy might compete with me for the same scarce resources, so option two is curtailed by option one. Actual destruction of enemies is just an extension of avoidance; that which no longer exists within my light-cone can no longer pose a threat.
Your characterization of my utility function as arbitrary is, itself, arbitrary. Deal with it.
According to the utility function that your current utility function doesn’t like, but that you will be delighted with once you try it out.
That description could apply to an overwhelming majority of the possible self-consistent utility functions (which are, last I checked, infinite in number), including all of those which lead to wireheading. Please be more specific.
Utility function #311289755230920891423. Try it. You’ll like it.
I have no solution to wireheading. I think a little wireheading might even be necessary. Maybe “wireheading” is a necessary component of “consciousness”, or “value”. Maybe all of the good places lie on a continuum between “wireheading” and “emotionless nihilism”.
Fallacy of moderation. Besides, wireheading and self-destructive nihilism aren’t opposite extremes on a spectrum, they’re just failure states within the solution space of possible value systems.
#311289755230920891423.
A string of random numbers is not an explanation.
I have a simple solution to wireheading… simple for me, anyway. I don’t like it, so I won’t seek it out, nor modify myself in any way that might reasonably cause me to like it or want to seek it out.
The fallacy of moderation is only a fallacy when someone posits that two things that are on a continuum, that aren’t actually on a continuum. (If they are on a continuum, it’s only a fallacy if you have independent means for finding a correct answer to the problem that the arguing groups have made errors on, rather than simply combining their utility functions.) The question I’m raising is whether wireheading is in fact just an endpoint on the same continuum that our favored states lie.
How do you define wireheading?
I define it as valuing your qualia instead of valuing states of the world. But could something that didn’t value its qualia be conscious? Could it have any fun? Would we like to be it? Isn’t valuing your qualia part of the definition of what a qualia is?
That’s why I prefer the ‘would it satisfy everyone who ever lived?’ strategy over CEV. Humanity’s future doesn’t have to be coherent. Coherence is something that happens at evolutionary choke-points, when some species dies back to within an order of magnitude of the minimum sustainable population. When some revolutionary development allows unprecedented surpluses, the more typical response is diversification.
Consider the trilobites. If there had been a trilobite-Friendly AI using CEV, invincible articulated shells would comb carpets of wet muck with the highest nutrient density possible within the laws of physics, across worlds orbiting every star in the sky. If there had been a trilobite-engineered AI going by 100% satisfaction of all historical trilobites, then trilobites would live long, healthy lives in a safe environment of adequate size, and the cambrian explosion (or something like it) would have proceeded without them.
Most people don’t know what they want until you show it to them, and most of what they really want is personal. Food, shelter, maybe a rival tribe that’s competent enough to be interesting but always loses when something’s really at stake. The option of exploring a larger world, seldom exercised. It doesn’t take a whole galaxy’s resources to provide that, even if we’re talking trillions of people.
I realized a pithy way of stating my objection to that strategy: given how unlikely I think it is that the test could be passed fairly by a Friendly AI, an AI passing the test is stronger evidence that the AI is cheating somehow than that the AI is Friendly.
If the AI is programmed so that it genuinely wants to pass the test (or the closest feasible approximation of the test) fairly, cheating isn’t an issue. This isn’t a matter of fast-talking it’s way out of a box. A properly-designed AI would be horrified at the prospect of ‘cheating,’ the way a loving mother is horrified at the prospect of having her child stolen by fairies and replaced with a near-indistinguishable simulacrum made from sticks and snow.
It is probably possible to pass that test by exploiting human psychology. It is probably impossible to do well on that test by trying to convince humans that your viewpoint is right.
You’re talking past orthonormal. You’re assuming a properly-designed AI. He’s saying that accomplishing the task would be strong evidence of unfriendliness.
What Phil said, and also:
Taboo “fairly”— this is another word the specification of which requires the whole of human values. Proving that the AI understands what we mean by fairness and wants to pass the test fairly is no easier than proving it Friendly in the first place.
“Fairly” was the wrong word in this context. Better might be ‘honest’ or ‘truthful.’ A truthful piece of information is one which increases the recipient’s ability to make accurate predictions; an honest speaker is one whose statements contain only truthful information.
About what? Anything? That sounds very easy.
Remember Goodhart’s Law—what we want is G, Good, not any particular G* normally correlated with Good.
Walking from Helsinki to Saigon sounds easy, too, depending on how it’s phrased. Just one foot in front of the other, right?
Humans make predictions all the time. Any time you perceive anything and are less than completely surprised by it, that’s because you made a prediction which was at least partly successful. If, after receiving and assimilating the information in question, any of your predictions is reduced in accuracy, any part of that map becomes less closely aligned with the territory, then the information was not perfectly honest. If you ignore or misinterpret it for whatever reason, even when it’s in some higher sense objectively accurate, that still fails the honesty test.
A rationalist should win; an honest communicator should make the audience understand.
Given the option, I’d take personal survival even at the cost of accurate perception and ability to act, but it’s not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.
What Robin is saying is, there’s a difference between
“metrics that correlate well enough with what you really want that you can make them the subject of contracts with other human beings”, and
“metrics that correlate well enough with what you really want that you can make them the subject of a transhuman intelligence’s goals”.
There are creative avenues of fulfilling the letter without fulfilling the spirit that would never occur to you but would almost certainly occur to a superintelligence, not because xe is malicious, but because they’re the optimal way to achieve the explicit goal set for xer. Your optimism, your belief that you can easily specify a goal (in computer code, not even English words) which admits of no undesirable creative shortcuts, is grossly misplaced once you bring smarter-than-human agents into the discussion. You cannot patch this problem; it has to be rigorously solved, or your AI wrecks the world.
Sure, but I don’t want to be locked in a box watching a light blink very predictably on and off.
Building the box reduces your ability to predict anything taking place outside the box. Even if the box can be sealed perfectly until the end of time without killing you (which would in itself be a surprise to anyone who knows thermodynamics), cutting off access to compilations of medical research reduces your ability to predict your own physiological reactions. Same goes for screwing with your brain functions.
I do not think you should be as confident as you are that your system is bulletproof. You have already had to elaborate and clarify and correct numerous times to rule out various kinds of paperclipping failures—all it takes is one elaboration or clarification or correction forgotten to allow for a new one, attacking the problem this way.
How confident do you think I am that my plan is bulletproof?
Given that you asked me the question, I reckon you give it somewhere between 1:100 and 2:1 odds of succeeding. I reckon the odds are negligible.
That’s our problem right there: you’re trying to persuade me to abandon a position I don’t actually hold. I agree that an AI based strictly on a survey of all historical humans would have negligible chance of success, simply because a literal survey is infeasible and any straightforward approximation of it would introduce unacceptable errors.
...why are you defending it, then? I don’t even see that thinking along those lines is helpful.
For everyone else, it was a chance to identify flaws in a proposition. No such thing as too much practice there. For me, it was a chance to experience firsthand the thought processes involved in defending a flawed proposition, necessary practice for recognizing other such flawed beliefs I might be holding; I had no religious upbringing to escape, so that common reference point is missing.
Furthermore, I knew from the outset that such a survey wouldn’t be practical, but I’ve been suspicious of CEV for a while now. It seems like it would be too hard to formalize, and at the same time, even if successful, too far removed from what people spend most of their time caring about. I couldn’t be satisfied that there wasn’t a better way to do it until I’d tried to find such a way myself.
It’s polite to give some signal that you’re playing devil’s advocate if you know you’re making weak arguments.
This is not a sufficient condition for establishing the optimality of CEV. Indeed, I’m not sure there isn’t a better way (nor even that CEV is workable), just that I have at present no candidates for one.
I apologize. I thought I had discharged the devil’s-advocacy-signaling obligation by ending my original post on the subject with a request to be proved wrong.
I agree that personal satisfaction with CEV isn’t a sufficient condition for it being safe. For that matter, having proposed and briefly defended this one alternative isn’t really sufficient for my personal satisfaction in either CEV’s adequacy or the lack of a better option. But we have to start somewhere, and if someone did come up with a better alternative to CEV, I’d want to make sure that it got fair consideration.
Your trilobite example is at odds with your everyone-who-lived strategy. The impact of the trilobite example is to show that CEV is fundamentally wrong, because trilobite cognition, no matter how far you extrapolate it, would never lead to love, or value it if it arose by chance.
Some degree of randomness is necessary to allow exploration of the landscape of possible worlds. CEV is designed to prevent exploration of that landscape.
Let me expand upon Vladimir’s comment:
You have not yet learned that a certain argumentative strategy against CEV is doomed to self-referential failure. You have just argued that “exploring the landscape of possible worlds” is a good thing, something that you value. I agree, and I think it’s a reflectively consistent value, which others generally share at some level and which they might share more completely if they knew more, thought faster, had grown up farther together, etc.
You then assume, without justification, that “exploring the landscape of possible worlds” will not be expressed as a part of CEV, and criticize it on these grounds.
Huh? What friggin’ definition of CEV are you using?!?
EDIT: I realized there was an insult in my original formulation. I apologize for being a dick on the Internet.
Because EY has specifically said that that must be avoided, when he describes evolution as something dangerous. I don’t think there’s any coherent way of saying both that CEV will constrain future development (which is its purpose), and that it will not prevent us from reaching some of the best optimums.
Most likely, all the best optimums lie in places that CEV is designed to keep us away from, just as trilobite CEV would keep us away from human values. So CEV is worse than random.
That a “trilobite CEV” would never lead to human values is hardly a criticism of CEV’s effectiveness. The world we have now is not “trilobite friendly”; trilobites are extinct!
CEV, as I understand it, is very weakly specified. All it says is that a developing seed AI chooses its value system after somehow taking into account what everyone would wish for, if they had a lot more time, knowledge, and cognitive power than they do have. It doesn’t necessarily mean, for example, that every human being alive is simulated, given superintelligence, and made to debate the future of the cosmos in a virtual parliament. The combination of better knowledge of reality and better knowledge of how the human mind actually works may make it extremely clear that the essence of human values, extrapolated, is XYZ, without any need for a virtual referendum, or even a single human simulation.
It is a mistake to suppose, for example, that a human-based CEV process will necessarily give rise to a civilizational value system which attaches intrinsic value to such complexities as food, sex, or sleep, and which will therefore be prejudiced against modes of being which involve none of these things. You can have a value system which attributes positive value to human beings getting those things, not because they are regarded as intrinsically good, but because entities getting what they like is regarded as intrinsically good.
If a human being is capable of proposing a value system which makes no explicit mention of human particularities at all (e.g. Ben Goertzel’s “growth, choice, and joy”), then so is the CEV process. So if the worry is that the future will be kept unnecessarily anthropomorphic, that is not a valid critique. (It might happen if something goes wrong, but we’re talking about the basic idea here, not the ways we might screw it up.)
You could say, even a non-anthropomorphic CEV might keep us away from “the best optimums”. But let’s consider what that would mean. The proposition would be that even in a civilization making the best, wisest, most informed, most open-minded choices it could make, it still might fall short of the best possible worlds. For that to be true, must it not be the case that those best possible worlds are extremely hard to “find”? And if you propose to find them by just being random, must there not be some risk of instead ending up in very bad futures? This criticism may be comparable to the criticism that rational investment is a bad idea, because you’d make much more money if you won the lottery. If these distant optima are so hard to find, even when you’re trying to find good outcomes, I don’t see how luck can be relied upon to get you there.
This issue of randomness is not absolute. One might expect a civilization with an agreed-upon value system to nonetheless conduct fundamental experiments from time to time. But if there were experiments whose outcomes might be dangerous as well as rewarding, it would be very foolish to just go ahead and do them because if we get lucky, the consequences would be good. Therefore, I do not think that unconstrained evolution can be favored over the outcomes of non-anthropomorphic CEV.
That doesn’t mean that you can’t examine possible trajectories of evolution for good things you wouldn’t have thought of yourself, just that you shouldn’t allow evolution to determine the actual future.
I’m not sure what you mean by “constrain” here. A process that reliably reaches an optimum (I’m not saying CEV is such a process) constrains future development to reach an optimum. Any nontrivial (and non-self-undermining, I suppose; one could value the nonexistence of optimization processes or something) value system, whether “provincially human” or not, prefers the world to be constrained into more valuable states.
I don’t see where you’ve responded to the point that CEV would incorporate whatever reasoning leads you to be concerned about this.
Or to take one step back:
It seems that you think there are two tiers of values, one consisting of provincial human values, and another consisting of the true universal values like “exploring the landscape of possible worlds”. You worry that CEV will catch only the first group of values.
From where I stand, this is just a mistaken question; the values you worry will be lost are provincial human values too! There’s no dividing line to miss.
I understand what you’re saying, and I’ve heard that answer before, repeatedly; and I don’t buy it.
Suppose we were arguing about the theory of evolution in the 19th century, and I said, “Look, this theory just doesn’t work, because our calculations indicate that selection doesn’t have the power necessary.” That was the state of things around the turn of the century, when genetic inheritance was assumed to be analog rather than discrete.
An acceptable answer would be to discover that genes were discrete things that an organism had just 2 copies of, and that one was often dominant, so that the equations did in fact show that selection had the necessary power.
An unacceptable answer would be to say, “What definition of evolution are you using? Evolution makes organisms evolve! If what you’re talking about doesn’t lead to more complex organisms, then it isn’t evolution.”
Just saying “Organisms become more complex over time” is not a theory of evolution. It’s more like an observation of evolution. A theory means you provide a mechanism and argue convincingly that it works. To get to a theory of CEV, you need to define what it’s supposed to accomplish, propose a mechanism, and show that the mechanism might accomplish the purpose.
You don’t have to get very far into this analysis to see why the answer you’ve given doesn’t, IMHO, work. I’ll try to post something later this afternoon on why.
I won’t get around to posting that today, but I’ll just add that I know that the intent of CEV is to solve the problems I’m complaining about. I know there are bullet points in the CEV document that say, “Renormalizing the dynamic”, “Caring about volition,” and, “Avoid hijacking the destiny of humankind.”
But I also know that the CEV document says,
and
I think there is what you could call an order-of-execution problem, and I think there’s a problem with things being ill-defined, and I think the desired outcome is logically impossible. I could be wrong. But since Eliezer worries that this could be the case, I find it strange that Eliezer’s bulldogs are so sure that there are no such problems, and so quick to shoot down discussion of them.
This is one of the things I don’t understand: If you think everything is just a provincial human value, then why do you care? Why not play video games or watch YouTube videos instead of arguing about CEV? Is it just more fun?
(There’s a longish section trying to answer this question in the CEV document, but I can’t make sense of it.)
There’s a distinction that hasn’t been made on LW yet, between personal values and evangelical values. Western thought traditionally blurs the distinction between them, and assumes that, if you have personal values, you value other people having your values, and must go on a crusade to get everybody else to adopt your personal values.
The CEVer position is, as far as I can tell, that they follow their values because that’s what they are programmed to do. It’s a weird sort of double-think that can only arise when you act on the supposition that you have no free will with which to act. They’re talking themselves into being evangelists for values that they don’t really believe in. It’s like taking the ability to follow a moral code that you know has no outside justification from Nietzsche’s “master morality”, and combining it with the prohibition against value-creation from his “slave morality”.
That’s how most values work. In general, I value human life. If someone does not share this value, and they decide to commit murder, then I would stop them if possible. If someone does not share this value, but is merely apathetic about murder rather than a potential murderer themselves, then I would cause them to share this value if possible, so there will be more people to help me stop actual murderers. So yes, at least in this case, I would act to get other people to adopt my values, or inhibit them from acting on their own values. Is this overly evangelical? What is bad about it?
In any case, history seems to indicate that “evangelizing your values” is a “universal human value”.
Groups that didn’t/don’t value evangelizing their values:
The Romans. They don’t care what you think; they just want you to pay your taxes.
The Jews. Because God didn’t choose you.
Nietzschians. Those are their values, dammit! Create your own!
Goths. (Angst-goths, not Visi-goths.) Because if everyone were a goth, they’d be just like everyone else.
We get into one sort of confusion by using particular values as examples. You talk about valuing human life. How about valuing the taste of avocados? Do you want to evangelize that? That’s kind of evangelism-neutral. How about the preferences you have that make one particular private place, or one particular person, or other limited resource, special to you? You don’t want to evangelize those preferences, or you’d have more competition. Is the first sort of value the only one CEV works with? How does it make that distinction?
We get into another sort of confusion by not distinguishing between the values we hold as individuals, the values we encourage our society to hold, and the values we want God to hold. The kind of values you want your God to hold are very different from the kind of values you want people to hold, in the same way that you want the referee to have different desires than the players. CEV mushes these two very different things together.
Good points. I haven’t thoroughly read the CEV document yet, so I don’t know if there is any discussion of this, but it does seem that it should make a distinction between those different types of values and preferences.
You never learn.
Folks. Vladimir’s response is not acceptable in a rational debate. The fact that it currently has 3 points is an indictment of the Less Wrong community.
Normally I would agree, but he was responding to “Some degree of randomness is necessary”. Seriously, you should know that isn’t right.
That post is about a different issue. It’s about whether introducing noise can help an optimization algorithm. Sounds similar; isn’t. The difference is that the optimization algorithm already knows the function that it’s trying to optimize.
The basic problem with CEV is that it requires reifying values in a strange way so that there are atomic “values” that can be isolated from an agent’s physical and cognitive architecture; and that (I think) it assumes that we have already evolved to the point where we have discovered all of these values. You can make very general value statements, such as that you value diversity, or complexity. But a trilobite can’t make any of those value statements. I think it’s likely that there are even more important fundamental value statements to be made that we have not yet conceptualized; and CEV is designed from the ground up specifically to prevent such new values from being incorporated into the utility function.
The need for randomness is not because random is good; it’s because, for the purpose of discovering better primitives (values) to create better utility functions, any utility function you can currently state is necessarily worse than random.
Since when is randomness required to explore the “landscape of possible worlds”? Or the possible values that we haven’t considered? A methodical search would be better. How did you miss that lesson from Worse Than Random, when it included an example (the pushbutton combination lock) of exploring a space of potential solutions?
Okay, you don’t actually need randomness, if you can work out a way of doing a methodical variation of all possible parameters.
(For problems of this nature, using random processes allows you to specify the statistical properties that you want the solution to have, which is often much simpler than specifying a deterministic process that has those properties. That’s one reason randomness is useful.)
The point I’m trying to make is that you need not to limit yourself to “searching”, meaning trying to optimize a function. You can only search when you know what you’re looking for. A value system can’t be evaluated from the outside. You have to try it on. Rationally, where “rational” means optimizing existing values, you wouldn’t do that. So randomness (or a rationally-ordered but irrationally-pursued exploration of parameter space) will lead to places no rational agent would go.
[EDIT: Wow, the parent comment completely changed since I responded to it. WTF?]
How do you plan to map a random number into a search a space that you could not explore systematically?
According to which utility function?
I have a bad habit of re-editing a comment for several minutes after first posting it.
Suppose you want to test a program whose input variables are distributed normally. You can write a big complicated equation to sample at uniform intervals from the cumulative distribution function for the gaussian distribution. Or you can say “x = mean; for i=1 to 10 { x += rnd(2)-1 }”.
Very often, the only data you know about your space is randomly-sampled data. So you look at that randomly-sampled data, and come up with some simple random model that would generate data with similar properties. The nature of the statistics you’ve gathered, such as the mean, variance, and correlations between observed variables, make it very hard to construct a deterministic model that would reproduce those statistics, but very easy to build a random model that does.
Some people really do have the kinds of misconceptions Eliezer was talking about; but the idea that there are hordes of scientists who attribute magical properties to randomness just isn’t true. This is not a fight you need to fight. And railing against all use of randomness in the simulation or study of complex processes just puts a big sticker on your head that says “I have no experience with what I’m talking about!”
We’re having 2 separate arguments here. I hope you realize that my comment that you originally responded to was not claiming that randomness has some magical power. It was about the need, when considering the future of the universe, for trying things out not just because your current utility function suggests they will have high utility. I used “random” as shorthand for “not directed by a utility function”.
According to the utility function that your current utility function doesn’t like, but that you will be delighted with once you try it out.
Yes, I understand you can use randomness as an approximate substitute for actually understanding the implications of your probability distributions. That does not really address my point, the randomness does not grant you access to a search space you could not otherwise explore.
If you analyze randomly-sampled data by considering the probability distribution of results for a random sampling, instead for the specific sampling you actually used, you are vulnerable to the mistake described here.
You can deterministically build a model that accounts for your uncertainty. Having a probability distribution is not the same thing as randomly choosing results from that distribution.
First of all, I am not “railing against all use of randomness in the simulation or study of complex processes”. I am objecting to your claim that “randomness is required” in an epistemilogical process. Second, you should not presume to warn me about stickers on my head.
You should realize that “randomness is required” does sound very much like “claiming that randomness has some magical power”, and if you mispoke, the correct response to the objection would be to admit that you made a mistake and apologize for the miscommunication, not to try to defend the wrong claim.
It appears that you don’t understand the purpose of utility functions. I do not want to have a utility function U that maximizes U(U), that assigns to itself higher utility than any other utility function assigns to itself. I want to achieve states of the world that maximize my current utility function.
You mean, for instance, by saying,
I’m not defending the previous wrong claim about “needing randomness”. I’m arguing against your wrong claim, which appears to be that one should never use randomness in your models.
It appears that you still don’t understand what my basic point is. You can’t improve your utility function by a search using your utility function. We have better utility functions than trilobites did. We could not have found them using trilobite utility functions. Trilobite CEV would, if performing optimally, have ruled them out. Extrapolate.
Wow, you are actually compounding the rudeness of abusing the edit feature to completely rewrite your comment by then analyzing my response to the original version as if it were responding to the edited version.
How did you get from “randomness is never required” to “randomness is never useful”? I acknowledge that sometimes randomness can be a good enough approximate substitute for the much harder strategy of actually understanding the implications of a probability distribution.
I understand your argument. It is wrong. You have not actually responded to my objection. To refute my objection, you would have to explain why I should want to give up my current utility function U0 in favor of some other utility function U such that
even though
Since U0 is my current utility function, and therefore (2) describes my current wants, you will not be able to convince me that I should be persuaded by (1), which is a meaningless comparison. Adopting U as my utility function does not help me maximize U0.
To the extent that trilobites can even be considered to have utility functions, my utility function is better than the trilobite utility function according to my values. The trilobites would disagree. An optimal human CEV would be a human SUCCESS and a trilobite FAIL. Likewise, an optimal trilobite CEV would be a trilobite SUCCESS and a human FAIL. There is no absolute universal utilility function that says one of these is better than the others. It is my human values that cause me to say that the human SUCCESS is better.
Unless, of course, it turns out that humans really like trilobites and would be willing to devote significant resources to keeping them alive, understanding their preferences, and carrying out those preferences (without compromising other human values). In that case, it’s mutual success.
You’re thinking of tribbles.
Tribbles, while cute, directly compete with humans for food. In the long view, trilobites might have an easier time finding their niche.
I’m breaking this out into a separate reply, because it’s its own sub-thread:
If no utility function, and hence no world state, is objectively better than any other, then all utility functions are wireheading. Because the only distinction between wireheading, and not wireheading, is that the wirehead only cares about his/her own qualia, not about states of the world. If the only reason you care about states of the world is because of how your utility function evaluates them—that is to say, what qualia they generate in you—you are a wirehead.
You have it backwards. I do not care about things because of how my utility function evaluates them. Rather, my utility function evaluates things the way it does because of how I care about it. My utility function is a description of my preferences, not the source of them.
I don’t think the order of execution matters here. If there’s no objective preference over states of the world, then there’s no objective reason to prefer “not wireheading” (caring about states of the world) over “wireheading” (caring only about your percepts).
There is no “objective” reason to do anything. Knowing that, what are you going to do anyways? Myself, I am still going to things for my subjective reasons.
Okay; but then don’t diss wireheading.
You appear to have an overexpansive definition of wireheading. Having an arbitrary utility function is not the same as wireheading. Wireheading is a very specific sort of alteration of utility functions that we (i.e. most humans, with our current, subjective utility functions, nearly universally) see as very dangerous, because it throws away what we currently care about. Wireheading is a “parochial” definition, not universal. But that’s OK.
What’s your definition of wireheading?
I didn’t define it as having an arbitrary utility function. I defined it as a utility function that depends only on your qualia.
What else can the utility function as implemented by your hardware depend on besides your qualia, and computations derived from your qualia?
Calling utility functions “wireheading” is a category error. Wireheading is either:
Directly acting on the machinery that implements one’s utility function to trivially satisfy this hardware, i.e. by directly injecting qualia rather than providing the qualia via what they are normally correlated with.
More broadly, altering one’s utility function to one that is trivial to broadly satisfy, such as by reinforcement via 1.
If you read my original comment, it’s clear that I meant wireheading is having a utility function that depends only on your qualia. Or maybe “choosing to have”.
Huh? So you think there’s nothing inside your head except qualia?
Beliefs aren’t qualia. Subconscious information isn’t qualia.
This sounds like a potentially good definition. But I’m unclear then why anyone using utility theory, and that definition, would object to wireheading. If you’ve got a utility function, and you can satisfy it, that’s the thing to do, right? Why does it matter how you satisfy it? You seem to be saying that the hardware implementation isn’t your real utility function, it’s just an implementation of it. As if the utility function stood somewhere outside you.
Beliefs and subconcious information are derived from qualia and the information about the external world that they correlate with, no?
Utility functions are a convenient mathematical description to describe preferences of entities in game theory and some decision theories, when these preferences are consistent. It’s useful as a metaphor for “what we want”, but when used loosely like this, there are troubles.
As applied to humans, this flat-out doesn’t work. Empirically and as a general rule, we’re not consistent, and most of us can readily be money-pumped. We do not have a nice clean module that weighs outcomes and assigns real numbers to them. Nor do we feed outcome weights into a probability weighting module, and then choose the maximum utility. Our values change on reflection. Heck, we’re not even unitary entities. Our consciousness is multi-faceted. There are the left and right brains communicating and negotiating through the corpus callosum. The information immediately accessible to the consciousness, what we identify with, is rather different than the information our subconscious uses. We are a gigantic hack of an intelligence built upon the shifting sands of stimulus-response and reinforcement conditioning. These joints in our selves make it easier to wirehead, and essentially kill our current selves, leaving only animal-level instincts, if that.
There are multiple utility functions running around here. The basic point was that what I consider important now matters to what choices I make now. The fact that I can make the future me have a new utility function, satisfied by wireheading, does not register positively on my current utility function. In fact, because it throws away almost everything I now care about, I am unlikely to do it now. My goals are “satisfy my current utility function”, and are always that, because that’s what we mean by the abstraction of utility function. My goals are not to satisfy what preferences I may later have. My goals are not to change my preferences to be easier to satisfy, because that means my current goals are less likely to be satisfied. If my goals change, than they will have changed, and only then will I choose differently. It’s not that my utility function stands outside of me: my utility function is part of me. Changing it changes me. It so happens that my utility function would be easily changed if I started directly stimulating my reward center. The reward center is not my utility function, though it is part of the implementation of my decision function (which if it were coherent, could be summarized in a utility function, and sets of probabilities). If we wish to identify the reward circuitry of my brain with a utility function, we’ve also got to put a few other utility functions in, and entities having these utility functions that are in a non-zero sum game with the reward circuitry.
Not as far as I know, no. You may be equating “qualia” with “percepts”. That’s not right.
If that analysis were correct, there would be no difficulty about wireheading. It would simply be an error.
There is a difficulty about wireheading, and I’m trying to talk about it. I’m looking at static situations: Is there something objectively wrong with a person plugged into themselves giving themselves orgasms forever?
The LW community has a consensus that there is something wrong with that. Yet they also have a consensus that there are no objective values. These are inconsistent.
You’re trying to say that wireheading is an error not because the final wirehead state reached is wrong, but because the path from here to there involved an error. That’s not a valid objection, for the reasons you gave in your comment: Humans are messy, and random variation is a natural part of the human hardware and software. And humans have been messy for some time. So if you can become a wirehead by a simple error, many people must already have made that error. And CEV has to incorporate their wirehead preferences equally with everyone else’s.
There’s something inconsistent about saying that human values are good, but the process generating those values is bad.
Well, I’m still not convinced there is a useful difference, though I see why philosophers would separate the concepts.
There is nothing objectively wrong with that, no.
The LW community has a consensus that there is something wrong with that judged by our current parochial values that we want to maintain. Not objectively wrong, but widely held inter-subjective agreement that lets us cooperate in trying to steer the future away from a course where everyone gets wireheaded.
No, I’m saying that the final state is wrong according to my current values. That’s what I mean by wrong: against my current values. Because it is wrong, any path reaching it must have an error in it somewhere.
We haven’t had the technology to truly wirehead until quite recently, though various addictions can be approximations.
Currently, there’s not enough wireheads, or addicts for that matter, to make much of a difference. Those that are wireheads want nothing more than to be wireheads, so I’m not sure that they would effect anything else under CEV. That’s one of the horrors of wireheading—all other values become lost. What we would have to worry about is a proselytizing wirehead, who wishes everyone else would convert. That seems an even harder end-state to reach than a simple wirehead.
Personally, I don’t want CEV applied to the whole human race. I think large swathes of the human race hold values that conflict badly with mine, and still would after perfect reflection. Wireheads would just be a small subset of that.
One of my intuitions about about human value is that it is highly diverse, and any extrapolation will be unable to find consensus / coherence in the way desired by CEV. As such, I’ve always thought that the most likely outcome of augmenting human value through the means of successful FAI would be highly diverse subpopulations all continuing to diverge, with a sort of evolutionary pressure for who receives the most resources. Wireheads should be easy to contain under such a scenario, and would leave expansion to the more active groups.
I was reverting to my meaning of “wireheading”. Sorry about that.
We agree on that.
I think one problem with CEV is that, to buy into CEV, you have to buy into this idea you’re pushing that values are completely subjective. This brings up the question of why anyone implementing CEV would want to include anybody else in the subset whose values are being extrapolated. That would be an error.
You could argue that it’s purely pragmatic—the CEVer needs to compromise with the rest of the world to avoid being crushed like a bug. But, hey, the CEVer has an AI on its side.
You could argue that the CEVer’s values include wanting to make other people happy, and believes it can do this by incorporating their values. There are 2 problems with this:
They would be sacrificing a near-infinite expected utility from propagating their values over all time and space, for a relatively infinitessimal one-time gain of happiness on the part of those currently alive here on Earth. So these have to be CEVers with high discounting of the future. Which makes me wonder why they’re interested in CEV.
Choosing the subset of people who manage to develop a friendly AI and set up CEV strongly selects for people who have the perpetuation of values as their dominant value. If someone claims that he will incorporate other peoples’ values in his CEV at the expense of perpetuating his own values because he’s a nice guy, you should expect that he has to date put more effort into being a nice guy than into CEV.
I think I see your point: a wireheading utility function would value (1) for providing the reward with less effort, while a nonwireheading utility function would disvalue (1) for providing the reward without the desideratum.
You should define ‘qualia,’ then, in such a way that makes it clear how they’re causally isolated from the rest of the universe.
I didn’t say they were causally isolated.
If you think that the notion of “qualia” requires them to be causally isolated from the universe (which is my guess at why you even bring the idea up), then the burden is on you to explain why everyone who discusses consciousness except Daniel Dennett is silly.
In that case, nothing can be said to depend only on the qualia, because anything that depends on them is also indirectly influenced by whatever the qualia themselves depend on.
When you say a function depends only on a set of variables, you mean that you can compute the function given the value of those variables.
Emotional responses aren’t independent variables, they’re functions of past and present sensory input.
Are there any independent variables in the real world? Variables are “independent” given a particular analysis.
When you say a function depends only on a set of variables, you mean that you can compute the function given the value of those variables. It doesn’t matter whether those variables are dependent on other variables.
No. That statement is three comments above the comment in which you said I should acknowledge my error. It was already there when you wrote that comment. And I also acknowledged my misstatement in the comment you were replying to, and elaborated on what I had meant when I made the comment.
Good! We agree.
Good! We agree again.
And we agree yet again!
And here is where we part ways.
Maybe there is no universal utility function. That’s a… I won’t say it’s a reasonable position, but I understand its appeal. I would call it an over-reasoned position, like when a philosopher announces that he has proved that he doesn’t exist. It’s time to go back to the drawing board when you come up with that conclusion. Or at least to take your own advice, and stop trying to change the world when you’ve already said it doesn’t matter how it changes.
But to believe that your utility function is nothing special, and still try to take over the universe and force your utility function on it for all time, is insane.
(Yes, yes, I know Eliezer has all sorts of disclaimers in the CEV document about how CEV should not try to take over the universe. I don’t believe that it’s logically possible; and I believe that his discussions of Friendly AI make it even clearer that his plans require complete control. Perhaps the theory is still vague enough that just maybe there’s a way around this; but I believe the burden of proof is on those who say there is a way around it.)
It would be consistent with the theory of utility functions if, in promoting CEV, you were acting on an inner drive that said, “Ooh, baby, I’m ensuring the survival of my utility function. Oh, God, yes! Yes! YES!” But that’s not what I see. I see people scribbling equations, studying the answers, and saying, “Hmm, it appears that my utility function is directing me to propagate itself. Oh, dear, I suppose I must, then.”
That’s just faking your utility function.
I think it’s key that the people I’m speaking of who believe utility functions are arbitrary, also believe they have no free will. And it’s probably also key that they assume their utility function must assign value to its own reproduction. They then use these two beliefs as an excuse to justify not following through on their belief about the arbitrariness of their utility function, because they think to do so would be logically impossible. “We can’t help ourselves! Our utility functions made us do it!” I don’t have a clean analysis, but there’s something circular, something wrong with this picture.
Let’s recap. You made a wrong claim. I responded to the wrong claim. You disputed my response. I refuted your disputation. You attempted to defend your claim. I responded to your defense. You edited your defense by replacing it with the acknowledgment of your mistake. You responded to my response still sort of defending your wrong claim, and attacking me for refuting your wrong claim. I defended my refutation, pointing out the you really did make the wrong claim and continued to defend it. And now you attack my defense, claiming that you did in fact acknowledge your mistake, and this should somehow negate your continued defense after the acknowledgement. Do you see how you are wrong here? When you acknowledge your claim is wrong, you should not at the same time criticize me for refuting your point.
I do believe my utility function is special. I don’t expect the universe (outside of me, my fellow humans, and any optimizing processes we spawn off) to agree with me. But, like Eliezer says, “We’ll see which one of us is still standing when this is over.”
No, that isn’t what happened. I’m not sure which comment the last sentence is supposed to refer to, but I’m p > .8 it didn’t happen that way. If it’s referring to the statement, “Okay, you don’t actually need randomness,” I wrote that before I ever saw your first response to that comment. But that doesn’t match up with what you just described; there weren’t that many exchanges before that comment. It also doesn’t match up with anything after that comment, since I still don’t acknowledge any such mistake made after that comment.
We’re talking about 2 separate claims. The wrong claim that I made was in an early statement where I said that you “needed randomness” to explore the space of possible utility functions. The right claim that I made, at length, was that randomness is a useful tool. You are conflating my defense of that claim, with defending the initial wrong claim. You’ve also said that you agree that randomness is a useful tool, which suggests that what is happening is that you made a whole series of comments that I say were attacking claim 2, and that you believe were attacking claim 1.
I’m not planning to tile the universe with myself, I just want myself or something closely isomorphic to me to continue to exist. The two most obvious ways to ensure my own continued existence are avoidance of things that would destroy me, particularly intelligent agents which could devote significant resources to destroying me personally, and making redundant copies. My own ability to copy myself is limited, and an imperfect copy might compete with me for the same scarce resources, so option two is curtailed by option one. Actual destruction of enemies is just an extension of avoidance; that which no longer exists within my light-cone can no longer pose a threat.
Your characterization of my utility function as arbitrary is, itself, arbitrary. Deal with it.
That description could apply to an overwhelming majority of the possible self-consistent utility functions (which are, last I checked, infinite in number), including all of those which lead to wireheading. Please be more specific.
Utility function #311289755230920891423. Try it. You’ll like it.
I have no solution to wireheading. I think a little wireheading might even be necessary. Maybe “wireheading” is a necessary component of “consciousness”, or “value”. Maybe all of the good places lie on a continuum between “wireheading” and “emotionless nihilism”.
Fallacy of moderation. Besides, wireheading and self-destructive nihilism aren’t opposite extremes on a spectrum, they’re just failure states within the solution space of possible value systems.
A string of random numbers is not an explanation.
I have a simple solution to wireheading… simple for me, anyway. I don’t like it, so I won’t seek it out, nor modify myself in any way that might reasonably cause me to like it or want to seek it out.
The fallacy of moderation is only a fallacy when someone posits that two things that are on a continuum, that aren’t actually on a continuum. (If they are on a continuum, it’s only a fallacy if you have independent means for finding a correct answer to the problem that the arguing groups have made errors on, rather than simply combining their utility functions.) The question I’m raising is whether wireheading is in fact just an endpoint on the same continuum that our favored states lie.
How do you define wireheading?
I define it as valuing your qualia instead of valuing states of the world. But could something that didn’t value its qualia be conscious? Could it have any fun? Would we like to be it? Isn’t valuing your qualia part of the definition of what a qualia is?
Your mom’s an indictment of the Less Wrong community.