I love your turn of phrase, it has a Cold War ring to it.
The question why anyone would ever sincerely want to build an AI which extrapolates anything other than their personal volition is still unclear to me. It hinges on the definition of “sincerely want”. If Eliezer can task the AI with looking at humanity and inferring its best wishes, why can’t he task it with looking at himself and inferring his best idea of how to infer humanity’s wishes? How do we determine, in general, which things a document like CEV must spell out and which things can/should be left to the mysterious magic of “intelligence”?
The question why anyone would ever sincerely want to build an AI which extrapolates anything other than their personal volition is still unclear to me. It hinges on the definition of “sincerely want”. If Eliezer can task the AI with looking at humanity and inferring its best wishes, why can’t he task it with looking at himself and inferring his best idea of how to infer humanity’s wishes?
This has been my thought exactly. Barring all but the most explicit convolution any given person would prefer their own personal volition to be extrapolated. If by happenstance I should be altruistically and perfectly infatuated by, say Sally, then that’s the FAI’s problem. It will turn out that extrapolating my volition will then entail extrapolating Sally’s volition. The same applies to caring about ‘humanity’, whatever that fuzzy concept means when taken in the context of unbounded future potential.
I am also not sure how to handle those who profess an ultimate preference for a possible AI that extrapolates other than their own volition. I mean, clearly they are either lying, crazy or naive. It seems safer to trust someone who says “I would ultimately prefer FAI but I am creating FAI for the purpose of effective cooperation.”
Similarly, if someone wanted to credibly signal altruism to me it would be better to try to convince me that CEV has a lot of similarities with CEV that arise due to altruistic desires rather than saying that they truly sincerely prefer CEV. Because the later is clearly bullshit of some sort.
How do we determine, in general, which things a document like CEV must spell out, and which things can/should be left to the mysterious magic of “intelligence”?
Eliezer appears to be asserting that CEV is equal for all humans. His arguments leave something to be desired. In particular, this is an assertion about human psychology, and requires evidence that is entangled with reality.
Leaving aside the question of whether even a single human’s volition can be extrapolated into a unique coherent utility function, this assertion has two major components:
1) humans are sufficiently altruistic that say CEV doesn’t in any way favor Alice over Bob.
2) humans are sufficiently similar that any apparent moral disagreement between Alice and Bob is caused by one or both having false beliefs about the physical world.
I find both these statements dubious, especially the first, since I see on reason why evolution would make us that altruistic.
Eliezer appears to be asserting that CEV is equal for all humans.
The “C” in “CEV” stands for “Coherent”. The concept refers to techniques of combining the wills of a bunch of agents. The idea is not normally applied to a population consisting of single human. That would just be EV. I am not aware of any evidence that Yu-El thinks that EV is independent of the .
Eliezer appears to be asserting that CEV is equal for all humans.
The phrase “is equal for all humans” is ambiguous. Even if all humans had identical psychologies, that could still all be selfish. The scare-quoted “source code” for Values and Values might be identical, but I think that both will involve self “pointers” resolving to Eliezer in one case and to Archimedes in the other.
We can define that two persons values are “parametrically identical” if they can be expressed in the same “source code”, but the code contains one or more parameters which are interpreted differently for different persons. A self pointer is one obvious parameter that we might be prepared to permit in “coherent” human values. That people are somewhat selfish does not necessarily conflict with our goal of determining a fair composite CEV of mankind—there are obvious ways of combining selfish values into composite values by giving “equal weight” (more scare quotes) to the values of each person.
The question then arises, are there other parameters we should expect besides self? I believe there are. One of them can be called the now pointer—it designates the current point in time. The now pointer in Values resolves to ~150 BC whereas Values resolves to ~2010 AD. Both are allowed to be more interested in the present and immediate future than in the distant future. (Whether they should be interested at all in the recent past is an interesting question, but somewhat orthogonal to the present topic.)
How do we combine now pointers of different persons when constructing a CEV for mankind. Do we do it by assigning “equal weights” to the now of each person as we did for the self pointers? I believe this would be a mistake. What we really want, I believe, is a weighting scheme which changes over time—a system of exponential discounting. Actions taken by an FAI in the year 2100 should mostly be for the satisfaction of the desires of people alive in 2100. The FAI will give some consideration in 2100 to the situation in 2110 because the people around in 2100 will also be interested in 2110 to some extent. It will (in 2100) give less consideration to the prospects in 2200, because people in 2100 will be not that interested in 2200. “After all”, they will rationally say to themselves, “we will be paying the year 2200 its due attention in 2180, and 2190, and especially 2199. Let the future care for itself. It certainly isn’t going to care for us!”
There are various other parameters that may appear in the idealized common “source code” for Values. For example, there may be different preferences regarding the discount rate used in the previous paragraph, and there may be different preferences regarding the “Malthusian factor”—how many biological descendents or clones one accumulates and how fast. It is not obvious to me whether we need to come up with rules for combining these into a CEV or whether the composite versions of these parameters fall out automatically from the rules for combining self and now parameters.
Sorry for the long response, but your comment inspired me.
What we really want, I believe, is a weighting scheme which changes over time—a system of exponential discounting. Actions taken by an FAI in the year 2100 should mostly be for the satisfaction of the desires of people alive in 2100. The FAI will give some consideration in 2100 to the situation in 2110 because the people around in 2100 will also be interested in 2110 to some extent. It will (in 2100) give less consideration to the prospects in 2200, because people in 2100 will be not that interested in 2200. “After all”, they will rationally say to themselves, “we will be paying the year 2200 its due attention in 2180, and 2190, and especially 2199.
I don’t think you need a “discounting” scheme. Or at least, you would get what is needed there “automatically”—if you just maximise expected utility. The same way Deep Blue doesn’t waste its time worrying about promoting pawns on the first move of the game—even if you give it the very long term (and not remotely “discounted”) goal of winning the whole game.
The same way Deep Blue doesn’t waste its time worrying about promoting pawns on the first move of the game—even if you give it the very long term (and not remotely “discounted”) goal of winning the whole game.
Is this really true? My understanding is that Deep Blue’s position evaluation function was determined by an analysis of a hundreds of thousands of games. Presumably it ranked openings which had a tendency to produce more promotion opportunities higher than openings which tended to produce fewer promotion opportunities (all else being equal and assuming promoting pawns correlates with wins).
I wasn’t talking about that—I meant it doesn’t evaluate board positions with promoted pawns at the start of the game—even though these are common positions in complete chess games. Anyway, forget that example if you don’t like it, the point it illustrates is unchanged.
I don’t think you need a “discounting” scheme. Or at least, you would get what is needed there “automatically”—if you just maximise expected utility.
Could you explain why you say that? I can imagine two possible reasons why you might, but they are both wrong. Your “Deep Blue” example suggests that you are laboring under some profound misconceptions about utility theory and the nature of instrumental values.
I did respond. I didn’t have an essay on the topic prepared—but Yu-El did, so I linked to that.
If you want to hear it in my own words:
Wiring in temporal discounting is usually bad—since the machine can usually figure out what temporal discounting is appropriate for its current circumstances and abilities much better than you can. It is the same as with any other type of proximate goal.
Instead you are usually best off just telling the machine your preferences about the possible states of the universe.
If you are thinking you want the machine to mirror your own preferences, then I recommend that you consider carefully whether your ultimate preferences include temporal discounting—or whether all that is just instrumental.
Or at least, you would get what is needed there [instead of discounting] “automatically”—if you just maximise expected utility.
You have still not explained why you said this. The question that discounting answers is, “Which is better: saving 3 lives today or saving 4 lives in 50 years?” Which is the same question as “Which of the two has the higher expected utility in current utilons?”
We want to maximize expected current utility regardless of what we decide regarding discounting.
However, since you do bring up the idea of maximizing expected utility, I am very curious how you can simultaneously claim (elsewhere on this thread) that utilities are figures of merit attached to actions rather than outcomes. Are you suggesting that we should be assessing our probability distribution over actions and then adding together the products of those probabilities with the utility of each action?
Regarding utility, utilities are just measures of satisfaction. They can be associated with anything.
It is a matter of fact that utilities are associated with actions in most agents—since agents have evolved to calculate utilities in order to allow them to choose between their possible actions.
I am not claiming that utilities are not frequently associated with outcomes. Utilities are frequently linked to outcomes—since most evolved agents are made so in such a way that they like to derive satisfaction by manipulating the external world.
However, nowhere in the definition of utility does it say that utilities are necessarily associated with external-world outcomes. Indeed, in the well-known phenomena of “wireheading” and “drug-taking” utility is divorced from external-world outcomes—and deliberately manufactured.
utilities are just measures of satisfaction. They can be associated with anything.
True. But in most economic analysis, terminal utilities are associated with outcomes; the expected utilities that become associated with actions are usually instrumental utilities.
Nevertheless, I continue to agree with you that in some circumstances, it makes sense to attach terminal utilities to actions. This shows up, for example, in discussions of morality from a deontological viewpoint. For example, suppose you have a choice of lying or telling the truth. You assess the consequences of your actions, and are amused to discover that there is no difference in the consequences—you will not be believed in any case. A utilitarian would say that there is no moral difference in this case between lying and telling the truth. A Kant disciple would disagree. And the way he would explain this disagreement to the utilitarian would be to attach a negative moral utility to the action of speaking untruthfully.
Utilities are often associated with states of the world, yes. However, here you seemed to balk at utilities that were not so associated. I think such values can still be called “utilities”—and “utility functions” can be used to describe how they are generated—and the standard economic framework accommodates this just fine.
What this idea doesn’t fit into is the von Neumann–Morgenstern system—since it typically violates the independence axiom. However, that is not the end of the world. That axiom can simply be binned—and fairly often it is.
What this idea doesn’t fit into is the von Neumann–Morgenstern system—since it typically violates the independence axiom.
Unless you supply some restrictions, it is considerably more destructive than that. All axioms based on consequentialism are blown away. You said yourself that we can assign utilities so as to rationalize any set of actions that an agent might choose. I.e. there are no irrational actions. I.e. decision theory and utility theory are roughly as useful as theology.
No, no! That is like saying that a universal computer is useless to scientists—because it can be made to predict anything!
Universal action is a useful and interesting concept partly because it allows a compact, utility-based description of arbitrary computable agents. Once you have a utility function for an agent, you can then combine and compare its utility function with that of other agents, and generally use the existing toolbox of economics to help model and analyse the agent’s behaviour. This is all surely a Good Thing.
I’ve never seen the phrase universal action before. Googling didn’t help me. It certainly sounds like it might be an interesting concept. Can you provide a link to an explanation more coherent than the one you have attempted to give here?
As to whether a “utility-based” description of an agent that does not adhere to the standard axioms of utility is a “good thing”—well I am doubtful. Surely it does not enable use of the standard toolbox of economics, because that toolbox takes for granted that the participants in the economy are (approximately) rational agents.
Part of the case for using a utility maximization framework is that we can see that many agents naturally use an internal representation of utility. This is true for companies, and other “economic” actors. It is true to some extent for animal brains—and it is true for many of the synthetic artificial agents that have been constructed. Since so many agents are naturally utility-based, that makes the framework an obvious modelling medium for intelligent agents.
I see no problem modeling computable agents without even mentioning “utility”.
Similarly, you can model serial computers without mentioning Turing machines and parallel computers without mentioning cellular automata. Yet in those cases, the general abstraction turns out to be a useful and important concept. I think this is just the same.
Universal construction and universal action have some caveats about being compatible with constraints imposted by things like physical law. “Doing anything” means something like: being able to feed arbitrary computable sequences in parallel to your motor outputs. Sequences that fail due to severing your own head don’t violate the spirit of the idea, though. As with universal computation, universal action is subject to resource limitations in practice. My coinage—AFAIK. Attribution: unpublished manuscript ;-)
Well, I’ll just ignore the fact that universal construction means to me something very different than it apparently means to you. Your claim seems to be that we can ‘program’ a machine (which is already known to maximize utility) so as to output any sequence of symbols we wish it to output; program it by the clever technique of assigning a numeric utility to each possible infinite output string, in such a way that we attach the largest numeric utility to the specific string that we want.
And you are claiming this in the same thread in which you disparage all forms of discounting the future.
According to von Neumann [18], a constructor is endowed with universal construction if it is able to construct every other automaton, i.e. an automaton of any dimensions.
The term has subsequently become overloaded, it is true.
If I understand it correctly, the rest of your comment is a quibble about infinity.
I don’t “get” that. Why not just take things one output symbol at a time?
Wow. I didn’t see that one coming. Self-reproducing cellular automata. Brings back memories.
If I understand it correctly, the rest of your comment is a quibble about infinity. I don’t “get” that. Why not just take things one output symbol at a time?
Well, it wasn’t just a quibble about infinity. There was also the dig about discount rates. ;)
But I really am mystified. Is a ‘step’ in this kind of computation to output a symbol and switch to a different state? Are there formulas for calculating utilities? What data go into the calculation?
Exactly how does computation work here? Perhaps I need an example. How would you use this ‘utility maximization as a programming language’ scheme to program the machine to compute the square root of 2? I really don’t understand how this is related to either lambda calculus or Turing machines. Why don’t you take some time, work out the details, and then produce one of your essays?
I didn’t (and still don’t) understand how discount rates were relevant—if not via considering the comment about infinite output strings.
What data go into the calculation of utilities? The available history of sense data, memories, and any current inputs. The agent’s internal state, IOW.
Exactly how does computation work here?
Just like it normally does? You just write the utility function in a Turing-complete language—which you have to do anyway if you want any generality. The only minor complication is how to get a (single-valued) “function” to output a collection of motor outputs in parallel—but serialisation provides a standard solution to this “problem”.
Universal action might get an essay one day.
...and yes, if I hear too many more times that humans don’t have utility functions (we are better than that!) - or that utility maximisation is a bad implementation plan - I might polish up a page that debunks those—ISTM—terribly-flawed concepts—so I can just refer people to that.
I would usually answer this with a measure of inclusive fitness. However, it appears here that we are just talking about the agent’s brain—so in this context what that maximises is just utility—since that is the conventional term for such a maximand.
Your options seem to be exploring how agents calculate utilities. Are those all the options? An agent usually calculates utilities associated with its possible actions—and then chooses the action associated with the highest utility. That option doesn’t seem to be on the list. It looks a bit like 1 - but that seems to specifiy no lookahead—or no lookahead of a particular kind. Future actions are usually very important influences when choosing the current action. Their utilities are usually pretty important too.
If you are trying to make sense of my views in this area, perhaps see the bits about pragmatic and ideal utility functions—here:
Yes. In fact, 2 strictly contains both 1 and 3, by virtue of setting the discount rate to either 0 or 1.
Future actions are usually very important influences when choosing the current action.
But not strictly as important as the utility of the outcome of the current action. The amount by which future actions are less important than the outcome of the current action, and the methods by which we determine that, are what we mean when we say discount rates.
Yes. In fact, 2 strictly contains both 1 and 3, by virtue of setting the discount rate to either 0 or 1.
That helps understand the options. I am not sure I had enough info to figure out what you meant before.
1 corresponds to eating chocolate gateau all day and not brushing your teeth—not very realistic as you say. 3 looks like an option containing infinite numbers—and 2 is what all practical agents actually do.
However, I don’t think this captures what we were talking about. Pragmatic utility functions are necessarily temporally discounted—due to resource limitations and other effects. The issue is more whether ideal utility functions can be expected to be so discounted. I can’t think why they should be—and can think of several reasons why they shouldn’t be—which we have already covered.
Infinity is surely not a problem—you can just maximise utility over T years and let T increase in an unbounded fashion. The uncertainty principle limits the predictions of embedded agents in practice—so T won’t ever become too large to deal with.
However, I don’t think this captures what we were talking about. Pragmatic utility functions are necessarily temporally discounted—due to resource limitations and other effects.
My understanding is that “pragmatic utility functions” are supposed to be approximations to “ideal utility functions”—preferable only because the “pragmatic” are effectively computable whereas the ideal are not.
Our argument is that we see nothing constraining ideal utility functions to be finite unless you allow discounting at the ideal level. And if ideal utilities are infinite, then pragmatic utilities that approximate them must be infinite too. And comparison of infinite utilities in the hope of detecting finite differences cannot usefully guide choice.
Hence, we believe that discounting at the ideal level is inevitable. Particularly if we are talking about potentially immortal agents (or mortal agents who care about an infinite future).
Your last paragraph made no sense. Are you claiming that the consequence of actions made today must inevitably have negligible effect upon the distant future? A rather fatalistic stance to find in a forum dealing with existential risk. And not particularly realistic, either.
You seem obsessed with infinity :-( What about the universal heat death? Forget about infinity—just consider whether we want to discount on a scale of 1 year, 10 years, 100 years, 1,000 years, 10,000 years—or whatever.
I think “ideal” short-term discounting is potentially problematical. Once we are out to discounting on a billion year timescale, that is well into the “how many angels dance on the head of a pin” territory—from my perspective.
Some of the causes of instrumental discounting look very difficult to overcome—even for a superintelligence. The future naturally gets discounted to the extent that you can’t predict and control it—and many phenomena (e.g. the weather) are very challenging to predict very far into the future—unless you can bring them actively under your control.
Are you claiming that the consequence of actions made today must inevitably have negligible effect upon the distant future?
No, The idea was that predicting those consequences is often hard—and it gets harder the further out you go. Long term predictions thus often don’t add much to what short-term ones give you.
Many factors “automatically” lead to temporal discounting if you don’t wire it in. The list includes:
Agents are mortal—they might die before the future utility arrives
Agents exhibit senescence—the present is more valuable to them than the future, because they are younger and more vital;
The future is uncertain—agents have limited capacities to predict the future;
The future is hard to predicably influence by actions taken now;
I think considerations such as the ones listed above adequately account for most temporal discounting in biology—though it is true that some of it may be the result of adaptations to deal with resource-limited cognition, or just plain stupidity.
Note that the list is dominated by items that are a function of the capabilities and limitations of the agent in question. If the agent conquers senescence, becomes immortal, or improves its ability to predict or predictably influence the future, then the factors all change around. This naturally results in a different temporal discounting scheme—so long as it has not previously been wired into the agent by myopic forces.
Basically, temporal discounting can often usefully be regarded as instrumental. Like energy, or gold, or warmth. You could specify how much each of these things is valued as well—but if you don’t they will be assigned instrumental value anyway. Unless you think you know their practical value better than a future superintelligent agent, perhaps you are better off leaving such issues to it. Tell the agent what state of affairs you actually want—and let it figure out the details of how best to get it for you.
Temporal discounting contrasts with risk aversion in this respect.
Basically, temporal discounting can often usefully be regarded as instrumental.
Quite true. I’m glad you included that word “often”. Now we can discuss the real issue: whether that word “often” should be changed to “always” as EY and yourself seem to claim. Or whether utility functions can and should incorporate the discounting of the value of temporally distant outcomes and pleasure-flows for reasons over and above considerations of instrumentality.
Temporal discounting contrasts with risk aversion in this respect.
A useful contrast/analogy. You seem to be claiming that risk aversion is not purely instrumental; that it can be fundamental; that we need to ask agents about their preferences among risky alternatives, rather than simply axiomatizing that a rational agent will be risk neutral.
But I disagree that this is in contrast to the situation with temporal discounting. We need to allow that rational and moral agents may discount the value of future outcomes and flows for fundamental, non-instrumental reasons. We need to ask them. This is particularly the case when we consider questions like the moral value of a human life.
The question before us is whether I should place the same moral value now on a human life next year and a human life 101 years from now. I say ‘no’; EY (and you?) say yes. What is EY’s justification for his position? Well, he might invent a moral principle that he might call “time invariance of moral value” and assert that this principle absolutely forces me to accept the equality:
value@t(life@t+1) = value@t(life@t+101).
I would counter that EY is using the invalid “strong principle of time invariance”. If one uses the valid “weak principle of time invariance” then all that we can prove is that:
value@t(life@t+1) = value@t+100(life@t+101)
So, we need another moral principle to get to where EY wants to go. EY postulates that the moral discount rate must be zero. I simply reject this postulate (as would the bulk of mankind, if asked). EY and I can both agree to a weaker postulate, “time invariance of moral preference”. But this only shows that the discounting must be exponential in time; it doesn’t show that the rate must be zero.
Neither EY nor you has provided any reason (beyond bare assertion) why the moral discount rate should be set to zero. Admittedly, I have yet to give any reason why it should be set elsewhere. This is not the place to do that. But I will point out that a finite discount rate permits us to avoid the mathematical absurdities arising from undiscounted utilities with an unbounded time horizon. EY says “So come up with better math!”—a response worth taking seriously. But until we have that better math in hand, I am pretty sure EY is wearing the crackpot hat here, not me.
Now we can discuss the real issue: whether that word “often” should be changed to “always” as EY and yourself seem to claim.
You can specify a method temporal discounting if you really want to. Just as you can specify a value for collecting gold atoms if you really want to. However, there are side effects and problems associated with introducing unnecessary constraints.
We need to allow that rational and moral agents may discount the value of future outcomes and flows for fundamental, non-instrumental reasons. We need to ask them.
If we think that such creatures are common and if we are trying to faithfully mirror and perpetuate their limitations, you mean.
Neither EY nor you has provided any reason (beyond bare assertion) why the moral discount rate should be set to zero.
I don’t really see this as a “should” question. However, there are consequences to wiring in instrumental values. You typically wind up with a handicapped superintelligence. I thought I already gave this as my reasoning, with comments such as “unless you think you know their practical value better than a future superintelligent agent, perhaps you are better off leaving such issues to it.”
I will point out that a finite discount rate permits us to avoid the mathematical absurdities arising from undiscounted utilities with an unbounded time horizon.
Not a practical issue—IMO. We are resource-limited creatures, who can barely see 10 years into the future. Instrumental temporal discounting protects us from infinite maths with great effectiveness.
This is the same as in biology. Organisms act as though they want to become ancestors—not just parents or grandparents. That is the optimisation target, anyway. However, instrumental temporal discounting protects them from far-future considerations with great effectiveness.
there are consequences to wiring in instrumental values. You typically wind up with a handicapped superintelligence. I thought I already gave this as my reasoning …
You did indeed. I noticed it, and meant to clarify that I am not advocating any kind of “wiring in”. Unfortunately, I failed to do so.
My position would be that human beings often have discount factors “wired in” by evolution. It is true, of course, that like every other moral instinct analyzed by EvoPsych, the ultimate adaptationist evolutionary explanation of this moral instinct is somewhat instrumental, but this doesn’t make it any less fundamental from the standpoint of the person born with this instinct.
As for moral values that we insert into AIs, these too are instrumental in terms of their final cause—we want the AIs to have particular values for our own instrumental reasons. But, for the AI, they are fundamental. But not necessarily ‘wired in’. If we, as I believe we should, give the AI a fundamental meta-value that it should construct its own fundamental values by empirically constructing some kind of CEV of mankind—if we do this then the AI will end up with a discount factor because his human models have discount factors. But it won’t be a wired-in or constant discount factor. Because the discount factors of mankind may well change over time as the expected lifespan of humans changes, as people upload and choose to run at various rates, as people are born or as they die.
I’m saying that we need to allow for an AI discount factor or factors which are not strictly instrumental, but which are not ‘wired in’ either. And especially not a wired-in discount factor of exactly zero!
I think we want a minimally myopic superintelligence—and fairly quickly. We should not aspire to program human limitations into machines—in a foolish attempt to mirror their values. If the Met. Office computer is handling orders asking it to look three months out—and an ethtics graduate says that it too future-oriented for a typical human, and it should me made to look less far out, so it better reflects human values—he should be told what an idiot he is being.
We use machines to complement human capabilities, not just to copy them. When it comes to discounting the future, machines will be able to see and influence furtther—and we would be well-advised let them.
Much harm is done today due to temporal discounting. Governments look no further than election day. Machines can help put a stop to such stupidity and negligence—but we have to know enough to let them.
As Eleizer says, he doesn’t propose doing much temporal discounting—except instrumentally. That kind of thing can be expected to go up against the wall as part of the “smarter, faster, wiser, better” part of his CEV.
And so we are in disagreement. But I hope you now understand that the disagreement is because our values are different rather than because I don’t understand the concept of values. Ironically our values differ in that I prefer to preserve my values and those of my conspecifics beyond the Singularity, whereas you distrust those values and the flawed cognition behind them, and you wish to have those imperfect human things replaced by something less messy.
I don’t see myself as doing any non-instrumental temporal discounting in the first place. So, for me personally, losing my non-instrumental temporal discounting doesn’t seem like much of a loss.
However, I do think that our temporal myopia is going to fall by the wayside. We will stop screwing over the immediate future because we don’t care about it enough. Myopic temporal discounting represents a primitive form of value—which is destined to go the way of cannibalism and slavery.
A CEV optimizer is less likely to do horrific things while its ability to extrapolate volition is “weak”. If it can’t extrapolate far from the unwise preferences people have now with the resources it has, it will notice that the EV varies a lot among the population, and take no action. Or if the extrapolation system has a bug in it, this will hopefully show up as well. So coherence is a kind of “sanity test”.
That’s one reason that leaps to mind anyway.
Of course the other is that there is no evidence any single human is Friendly anyway, so cooperation would be impossible among EV maximizing AI researchers. As such, an AI that maximizes EV is out of the question already. CEV is the next best thing.
I love your turn of phrase, it has a Cold War ring to it.
The question why anyone would ever sincerely want to build an AI which extrapolates anything other than their personal volition is still unclear to me. It hinges on the definition of “sincerely want”. If Eliezer can task the AI with looking at humanity and inferring its best wishes, why can’t he task it with looking at himself and inferring his best idea of how to infer humanity’s wishes? How do we determine, in general, which things a document like CEV must spell out and which things can/should be left to the mysterious magic of “intelligence”?
This has been my thought exactly. Barring all but the most explicit convolution any given person would prefer their own personal volition to be extrapolated. If by happenstance I should be altruistically and perfectly infatuated by, say Sally, then that’s the FAI’s problem. It will turn out that extrapolating my volition will then entail extrapolating Sally’s volition. The same applies to caring about ‘humanity’, whatever that fuzzy concept means when taken in the context of unbounded future potential.
I am also not sure how to handle those who profess an ultimate preference for a possible AI that extrapolates other than their own volition. I mean, clearly they are either lying, crazy or naive. It seems safer to trust someone who says “I would ultimately prefer FAI but I am creating FAI for the purpose of effective cooperation.”
Similarly, if someone wanted to credibly signal altruism to me it would be better to try to convince me that CEV has a lot of similarities with CEV that arise due to altruistic desires rather than saying that they truly sincerely prefer CEV. Because the later is clearly bullshit of some sort.
I have no idea, I’m afraid.
Eliezer appears to be asserting that CEV is equal for all humans. His arguments leave something to be desired. In particular, this is an assertion about human psychology, and requires evidence that is entangled with reality.
Leaving aside the question of whether even a single human’s volition can be extrapolated into a unique coherent utility function, this assertion has two major components:
1) humans are sufficiently altruistic that say CEV doesn’t in any way favor Alice over Bob.
2) humans are sufficiently similar that any apparent moral disagreement between Alice and Bob is caused by one or both having false beliefs about the physical world.
I find both these statements dubious, especially the first, since I see on reason why evolution would make us that altruistic.
The “C” in “CEV” stands for “Coherent”. The concept refers to techniques of combining the wills of a bunch of agents. The idea is not normally applied to a population consisting of single human. That would just be EV. I am not aware of any evidence that Yu-El thinks that EV is independent of the .
The phrase “is equal for all humans” is ambiguous. Even if all humans had identical psychologies, that could still all be selfish. The scare-quoted “source code” for Values and Values might be identical, but I think that both will involve self “pointers” resolving to Eliezer in one case and to Archimedes in the other.
We can define that two persons values are “parametrically identical” if they can be expressed in the same “source code”, but the code contains one or more parameters which are interpreted differently for different persons. A self pointer is one obvious parameter that we might be prepared to permit in “coherent” human values. That people are somewhat selfish does not necessarily conflict with our goal of determining a fair composite CEV of mankind—there are obvious ways of combining selfish values into composite values by giving “equal weight” (more scare quotes) to the values of each person.
The question then arises, are there other parameters we should expect besides self? I believe there are. One of them can be called the now pointer—it designates the current point in time. The now pointer in Values resolves to ~150 BC whereas Values resolves to ~2010 AD. Both are allowed to be more interested in the present and immediate future than in the distant future. (Whether they should be interested at all in the recent past is an interesting question, but somewhat orthogonal to the present topic.)
How do we combine now pointers of different persons when constructing a CEV for mankind. Do we do it by assigning “equal weights” to the now of each person as we did for the self pointers? I believe this would be a mistake. What we really want, I believe, is a weighting scheme which changes over time—a system of exponential discounting. Actions taken by an FAI in the year 2100 should mostly be for the satisfaction of the desires of people alive in 2100. The FAI will give some consideration in 2100 to the situation in 2110 because the people around in 2100 will also be interested in 2110 to some extent. It will (in 2100) give less consideration to the prospects in 2200, because people in 2100 will be not that interested in 2200. “After all”, they will rationally say to themselves, “we will be paying the year 2200 its due attention in 2180, and 2190, and especially 2199. Let the future care for itself. It certainly isn’t going to care for us!”
There are various other parameters that may appear in the idealized common “source code” for Values. For example, there may be different preferences regarding the discount rate used in the previous paragraph, and there may be different preferences regarding the “Malthusian factor”—how many biological descendents or clones one accumulates and how fast. It is not obvious to me whether we need to come up with rules for combining these into a CEV or whether the composite versions of these parameters fall out automatically from the rules for combining self and now parameters.
Sorry for the long response, but your comment inspired me.
I don’t think you need a “discounting” scheme. Or at least, you would get what is needed there “automatically”—if you just maximise expected utility. The same way Deep Blue doesn’t waste its time worrying about promoting pawns on the first move of the game—even if you give it the very long term (and not remotely “discounted”) goal of winning the whole game.
Is this really true? My understanding is that Deep Blue’s position evaluation function was determined by an analysis of a hundreds of thousands of games. Presumably it ranked openings which had a tendency to produce more promotion opportunities higher than openings which tended to produce fewer promotion opportunities (all else being equal and assuming promoting pawns correlates with wins).
I wasn’t talking about that—I meant it doesn’t evaluate board positions with promoted pawns at the start of the game—even though these are common positions in complete chess games. Anyway, forget that example if you don’t like it, the point it illustrates is unchanged.
Could you explain why you say that? I can imagine two possible reasons why you might, but they are both wrong. Your “Deep Blue” example suggests that you are laboring under some profound misconceptions about utility theory and the nature of instrumental values.
This is this one again. You don’t yet seem to agree with it—and it isn’t clear to me why not.
Nor is it clear to me why you did not respond to my question / request for clarification.
I did respond. I didn’t have an essay on the topic prepared—but Yu-El did, so I linked to that.
If you want to hear it in my own words:
Wiring in temporal discounting is usually bad—since the machine can usually figure out what temporal discounting is appropriate for its current circumstances and abilities much better than you can. It is the same as with any other type of proximate goal.
Instead you are usually best off just telling the machine your preferences about the possible states of the universe.
If you are thinking you want the machine to mirror your own preferences, then I recommend that you consider carefully whether your ultimate preferences include temporal discounting—or whether all that is just instrumental.
I don’t see how. My question was:
Referring to this that you said:
You have still not explained why you said this. The question that discounting answers is, “Which is better: saving 3 lives today or saving 4 lives in 50 years?” Which is the same question as “Which of the two has the higher expected utility in current utilons?” We want to maximize expected current utility regardless of what we decide regarding discounting.
However, since you do bring up the idea of maximizing expected utility, I am very curious how you can simultaneously claim (elsewhere on this thread) that utilities are figures of merit attached to actions rather than outcomes. Are you suggesting that we should be assessing our probability distribution over actions and then adding together the products of those probabilities with the utility of each action?
Regarding utility, utilities are just measures of satisfaction. They can be associated with anything.
It is a matter of fact that utilities are associated with actions in most agents—since agents have evolved to calculate utilities in order to allow them to choose between their possible actions.
I am not claiming that utilities are not frequently associated with outcomes. Utilities are frequently linked to outcomes—since most evolved agents are made so in such a way that they like to derive satisfaction by manipulating the external world.
However, nowhere in the definition of utility does it say that utilities are necessarily associated with external-world outcomes. Indeed, in the well-known phenomena of “wireheading” and “drug-taking” utility is divorced from external-world outcomes—and deliberately manufactured.
True. But in most economic analysis, terminal utilities are associated with outcomes; the expected utilities that become associated with actions are usually instrumental utilities.
Nevertheless, I continue to agree with you that in some circumstances, it makes sense to attach terminal utilities to actions. This shows up, for example, in discussions of morality from a deontological viewpoint. For example, suppose you have a choice of lying or telling the truth. You assess the consequences of your actions, and are amused to discover that there is no difference in the consequences—you will not be believed in any case. A utilitarian would say that there is no moral difference in this case between lying and telling the truth. A Kant disciple would disagree. And the way he would explain this disagreement to the utilitarian would be to attach a negative moral utility to the action of speaking untruthfully.
Utilities are often associated with states of the world, yes. However, here you seemed to balk at utilities that were not so associated. I think such values can still be called “utilities”—and “utility functions” can be used to describe how they are generated—and the standard economic framework accommodates this just fine.
What this idea doesn’t fit into is the von Neumann–Morgenstern system—since it typically violates the independence axiom. However, that is not the end of the world. That axiom can simply be binned—and fairly often it is.
Unless you supply some restrictions, it is considerably more destructive than that. All axioms based on consequentialism are blown away. You said yourself that we can assign utilities so as to rationalize any set of actions that an agent might choose. I.e. there are no irrational actions. I.e. decision theory and utility theory are roughly as useful as theology.
No, no! That is like saying that a universal computer is useless to scientists—because it can be made to predict anything!
Universal action is a useful and interesting concept partly because it allows a compact, utility-based description of arbitrary computable agents. Once you have a utility function for an agent, you can then combine and compare its utility function with that of other agents, and generally use the existing toolbox of economics to help model and analyse the agent’s behaviour. This is all surely a Good Thing.
I’ve never seen the phrase universal action before. Googling didn’t help me. It certainly sounds like it might be an interesting concept. Can you provide a link to an explanation more coherent than the one you have attempted to give here?
As to whether a “utility-based” description of an agent that does not adhere to the standard axioms of utility is a “good thing”—well I am doubtful. Surely it does not enable use of the standard toolbox of economics, because that toolbox takes for granted that the participants in the economy are (approximately) rational agents.
You have an alternative model of arbitrary computable agents to propose?
You don’t think the ability to model an arbitrary computable agent is useful?
What is the problem here? Surely a simple utility-based framework for modelling the computable agent of your choice is an obvious Good Thing.
I see no problem modeling computable agents without even mentioning “utility”.
I don’t yet see how modeling them as irrational utility maximizers is useful, since a non-utility-based approach will probably be simpler.
Part of the case for using a utility maximization framework is that we can see that many agents naturally use an internal representation of utility. This is true for companies, and other “economic” actors. It is true to some extent for animal brains—and it is true for many of the synthetic artificial agents that have been constructed. Since so many agents are naturally utility-based, that makes the framework an obvious modelling medium for intelligent agents.
Similarly, you can model serial computers without mentioning Turing machines and parallel computers without mentioning cellular automata. Yet in those cases, the general abstraction turns out to be a useful and important concept. I think this is just the same.
Universal action is named after universal computation and universal construction.
Universal construction and universal action have some caveats about being compatible with constraints imposted by things like physical law. “Doing anything” means something like: being able to feed arbitrary computable sequences in parallel to your motor outputs. Sequences that fail due to severing your own head don’t violate the spirit of the idea, though. As with universal computation, universal action is subject to resource limitations in practice. My coinage—AFAIK. Attribution: unpublished manuscript ;-)
Well, I’ll just ignore the fact that universal construction means to me something very different than it apparently means to you. Your claim seems to be that we can ‘program’ a machine (which is already known to maximize utility) so as to output any sequence of symbols we wish it to output; program it by the clever technique of assigning a numeric utility to each possible infinite output string, in such a way that we attach the largest numeric utility to the specific string that we want.
And you are claiming this in the same thread in which you disparage all forms of discounting the future.
What am I missing here?
For my usage, see:
http://carg2.epfl.ch/Publications/2004/PhysicaD04-Mange.pdf
The term has subsequently become overloaded, it is true.
If I understand it correctly, the rest of your comment is a quibble about infinity. I don’t “get” that. Why not just take things one output symbol at a time?
Wow. I didn’t see that one coming. Self-reproducing cellular automata. Brings back memories.
Well, it wasn’t just a quibble about infinity. There was also the dig about discount rates. ;)
But I really am mystified. Is a ‘step’ in this kind of computation to output a symbol and switch to a different state? Are there formulas for calculating utilities? What data go into the calculation?
Exactly how does computation work here? Perhaps I need an example. How would you use this ‘utility maximization as a programming language’ scheme to program the machine to compute the square root of 2? I really don’t understand how this is related to either lambda calculus or Turing machines. Why don’t you take some time, work out the details, and then produce one of your essays?
I didn’t (and still don’t) understand how discount rates were relevant—if not via considering the comment about infinite output strings.
What data go into the calculation of utilities? The available history of sense data, memories, and any current inputs. The agent’s internal state, IOW.
Just like it normally does? You just write the utility function in a Turing-complete language—which you have to do anyway if you want any generality. The only minor complication is how to get a (single-valued) “function” to output a collection of motor outputs in parallel—but serialisation provides a standard solution to this “problem”.
Universal action might get an essay one day.
...and yes, if I hear too many more times that humans don’t have utility functions (we are better than that!) - or that utility maximisation is a bad implementation plan - I might polish up a page that debunks those—ISTM—terribly-flawed concepts—so I can just refer people to that.
What is it that the agent acts so as to maximize?
The utility of the next action (ignoring the utility of expected future actions)
The utility of the next action plus a discounted expectation of future utilities.
The simple sum of all future expected utilities.
To me, only the first two options make mathematical sense, but the first doesn’t really make sense as a model of human motivation.
I would usually answer this with a measure of inclusive fitness. However, it appears here that we are just talking about the agent’s brain—so in this context what that maximises is just utility—since that is the conventional term for such a maximand.
Your options seem to be exploring how agents calculate utilities. Are those all the options? An agent usually calculates utilities associated with its possible actions—and then chooses the action associated with the highest utility. That option doesn’t seem to be on the list. It looks a bit like 1 - but that seems to specifiy no lookahead—or no lookahead of a particular kind. Future actions are usually very important influences when choosing the current action. Their utilities are usually pretty important too.
If you are trying to make sense of my views in this area, perhaps see the bits about pragmatic and ideal utility functions—here:
http://timtyler.org/expected_utility_maximisers/
Yes. In fact, 2 strictly contains both 1 and 3, by virtue of setting the discount rate to either 0 or 1.
But not strictly as important as the utility of the outcome of the current action. The amount by which future actions are less important than the outcome of the current action, and the methods by which we determine that, are what we mean when we say discount rates.
That helps understand the options. I am not sure I had enough info to figure out what you meant before.
1 corresponds to eating chocolate gateau all day and not brushing your teeth—not very realistic as you say. 3 looks like an option containing infinite numbers—and 2 is what all practical agents actually do.
However, I don’t think this captures what we were talking about. Pragmatic utility functions are necessarily temporally discounted—due to resource limitations and other effects. The issue is more whether ideal utility functions can be expected to be so discounted. I can’t think why they should be—and can think of several reasons why they shouldn’t be—which we have already covered.
Infinity is surely not a problem—you can just maximise utility over T years and let T increase in an unbounded fashion. The uncertainty principle limits the predictions of embedded agents in practice—so T won’t ever become too large to deal with.
My understanding is that “pragmatic utility functions” are supposed to be approximations to “ideal utility functions”—preferable only because the “pragmatic” are effectively computable whereas the ideal are not.
Our argument is that we see nothing constraining ideal utility functions to be finite unless you allow discounting at the ideal level. And if ideal utilities are infinite, then pragmatic utilities that approximate them must be infinite too. And comparison of infinite utilities in the hope of detecting finite differences cannot usefully guide choice. Hence, we believe that discounting at the ideal level is inevitable. Particularly if we are talking about potentially immortal agents (or mortal agents who care about an infinite future).
Your last paragraph made no sense. Are you claiming that the consequence of actions made today must inevitably have negligible effect upon the distant future? A rather fatalistic stance to find in a forum dealing with existential risk. And not particularly realistic, either.
You seem obsessed with infinity :-( What about the universal heat death? Forget about infinity—just consider whether we want to discount on a scale of 1 year, 10 years, 100 years, 1,000 years, 10,000 years—or whatever.
I think “ideal” short-term discounting is potentially problematical. Once we are out to discounting on a billion year timescale, that is well into the “how many angels dance on the head of a pin” territory—from my perspective.
Some of the causes of instrumental discounting look very difficult to overcome—even for a superintelligence. The future naturally gets discounted to the extent that you can’t predict and control it—and many phenomena (e.g. the weather) are very challenging to predict very far into the future—unless you can bring them actively under your control.
No, The idea was that predicting those consequences is often hard—and it gets harder the further out you go. Long term predictions thus often don’t add much to what short-term ones give you.
Flippantly: we’re going to have billions of years to find a solution to that problem.
Many factors “automatically” lead to temporal discounting if you don’t wire it in. The list includes:
Agents are mortal—they might die before the future utility arrives
Agents exhibit senescence—the present is more valuable to them than the future, because they are younger and more vital;
The future is uncertain—agents have limited capacities to predict the future;
The future is hard to predicably influence by actions taken now;
I think considerations such as the ones listed above adequately account for most temporal discounting in biology—though it is true that some of it may be the result of adaptations to deal with resource-limited cognition, or just plain stupidity.
Note that the list is dominated by items that are a function of the capabilities and limitations of the agent in question. If the agent conquers senescence, becomes immortal, or improves its ability to predict or predictably influence the future, then the factors all change around. This naturally results in a different temporal discounting scheme—so long as it has not previously been wired into the agent by myopic forces.
Basically, temporal discounting can often usefully be regarded as instrumental. Like energy, or gold, or warmth. You could specify how much each of these things is valued as well—but if you don’t they will be assigned instrumental value anyway. Unless you think you know their practical value better than a future superintelligent agent, perhaps you are better off leaving such issues to it. Tell the agent what state of affairs you actually want—and let it figure out the details of how best to get it for you.
Temporal discounting contrasts with risk aversion in this respect.
Quite true. I’m glad you included that word “often”. Now we can discuss the real issue: whether that word “often” should be changed to “always” as EY and yourself seem to claim. Or whether utility functions can and should incorporate the discounting of the value of temporally distant outcomes and pleasure-flows for reasons over and above considerations of instrumentality.
A useful contrast/analogy. You seem to be claiming that risk aversion is not purely instrumental; that it can be fundamental; that we need to ask agents about their preferences among risky alternatives, rather than simply axiomatizing that a rational agent will be risk neutral.
But I disagree that this is in contrast to the situation with temporal discounting. We need to allow that rational and moral agents may discount the value of future outcomes and flows for fundamental, non-instrumental reasons. We need to ask them. This is particularly the case when we consider questions like the moral value of a human life.
The question before us is whether I should place the same moral value now on a human life next year and a human life 101 years from now. I say ‘no’; EY (and you?) say yes. What is EY’s justification for his position? Well, he might invent a moral principle that he might call “time invariance of moral value” and assert that this principle absolutely forces me to accept the equality:
value@t(life@t+1) = value@t(life@t+101).
I would counter that EY is using the invalid “strong principle of time invariance”. If one uses the valid “weak principle of time invariance” then all that we can prove is that:
value@t(life@t+1) = value@t+100(life@t+101)
So, we need another moral principle to get to where EY wants to go. EY postulates that the moral discount rate must be zero. I simply reject this postulate (as would the bulk of mankind, if asked). EY and I can both agree to a weaker postulate, “time invariance of moral preference”. But this only shows that the discounting must be exponential in time; it doesn’t show that the rate must be zero.
Neither EY nor you has provided any reason (beyond bare assertion) why the moral discount rate should be set to zero. Admittedly, I have yet to give any reason why it should be set elsewhere. This is not the place to do that. But I will point out that a finite discount rate permits us to avoid the mathematical absurdities arising from undiscounted utilities with an unbounded time horizon. EY says “So come up with better math!”—a response worth taking seriously. But until we have that better math in hand, I am pretty sure EY is wearing the crackpot hat here, not me.
You can specify a method temporal discounting if you really want to. Just as you can specify a value for collecting gold atoms if you really want to. However, there are side effects and problems associated with introducing unnecessary constraints.
If we think that such creatures are common and if we are trying to faithfully mirror and perpetuate their limitations, you mean.
I don’t really see this as a “should” question. However, there are consequences to wiring in instrumental values. You typically wind up with a handicapped superintelligence. I thought I already gave this as my reasoning, with comments such as “unless you think you know their practical value better than a future superintelligent agent, perhaps you are better off leaving such issues to it.”
Not a practical issue—IMO. We are resource-limited creatures, who can barely see 10 years into the future. Instrumental temporal discounting protects us from infinite maths with great effectiveness.
This is the same as in biology. Organisms act as though they want to become ancestors—not just parents or grandparents. That is the optimisation target, anyway. However, instrumental temporal discounting protects them from far-future considerations with great effectiveness.
You did indeed. I noticed it, and meant to clarify that I am not advocating any kind of “wiring in”. Unfortunately, I failed to do so.
My position would be that human beings often have discount factors “wired in” by evolution. It is true, of course, that like every other moral instinct analyzed by EvoPsych, the ultimate adaptationist evolutionary explanation of this moral instinct is somewhat instrumental, but this doesn’t make it any less fundamental from the standpoint of the person born with this instinct.
As for moral values that we insert into AIs, these too are instrumental in terms of their final cause—we want the AIs to have particular values for our own instrumental reasons. But, for the AI, they are fundamental. But not necessarily ‘wired in’. If we, as I believe we should, give the AI a fundamental meta-value that it should construct its own fundamental values by empirically constructing some kind of CEV of mankind—if we do this then the AI will end up with a discount factor because his human models have discount factors. But it won’t be a wired-in or constant discount factor. Because the discount factors of mankind may well change over time as the expected lifespan of humans changes, as people upload and choose to run at various rates, as people are born or as they die.
I’m saying that we need to allow for an AI discount factor or factors which are not strictly instrumental, but which are not ‘wired in’ either. And especially not a wired-in discount factor of exactly zero!
I think we want a minimally myopic superintelligence—and fairly quickly. We should not aspire to program human limitations into machines—in a foolish attempt to mirror their values. If the Met. Office computer is handling orders asking it to look three months out—and an ethtics graduate says that it too future-oriented for a typical human, and it should me made to look less far out, so it better reflects human values—he should be told what an idiot he is being.
We use machines to complement human capabilities, not just to copy them. When it comes to discounting the future, machines will be able to see and influence furtther—and we would be well-advised let them.
Much harm is done today due to temporal discounting. Governments look no further than election day. Machines can help put a stop to such stupidity and negligence—but we have to know enough to let them.
As Eleizer says, he doesn’t propose doing much temporal discounting—except instrumentally. That kind of thing can be expected to go up against the wall as part of the “smarter, faster, wiser, better” part of his CEV.
And so we are in disagreement. But I hope you now understand that the disagreement is because our values are different rather than because I don’t understand the concept of values. Ironically our values differ in that I prefer to preserve my values and those of my conspecifics beyond the Singularity, whereas you distrust those values and the flawed cognition behind them, and you wish to have those imperfect human things replaced by something less messy.
I don’t see myself as doing any non-instrumental temporal discounting in the first place. So, for me personally, losing my non-instrumental temporal discounting doesn’t seem like much of a loss.
However, I do think that our temporal myopia is going to fall by the wayside. We will stop screwing over the immediate future because we don’t care about it enough. Myopic temporal discounting represents a primitive form of value—which is destined to go the way of cannibalism and slavery.
A CEV optimizer is less likely to do horrific things while its ability to extrapolate volition is “weak”. If it can’t extrapolate far from the unwise preferences people have now with the resources it has, it will notice that the EV varies a lot among the population, and take no action. Or if the extrapolation system has a bug in it, this will hopefully show up as well. So coherence is a kind of “sanity test”.
That’s one reason that leaps to mind anyway.
Of course the other is that there is no evidence any single human is Friendly anyway, so cooperation would be impossible among EV maximizing AI researchers. As such, an AI that maximizes EV is out of the question already. CEV is the next best thing.