You have failed to address my criticisms of you points, that you are seeking out only examples that support your desired conclusion, and that you are ignoring details that would allow you to construct a narrower, more relevant reference class for your outside view argument.
And what are you trying to tell me with this link?
I was telling you the “ruling out the possibility” is the wrong, (in fact impossible), standard.
You have failed to address my criticisms of you points, that you are seeking out only examples that support your desired conclusion.
Only now I understand your criticism. I do not seek out examples to support my conclusion but to weaken your argument that one should trust Yudkowsky because of his previous output. I’m aware that Yudkowsky can very well be right about the idea but do in fact believe that the risk is worth taking. Have I done extensive research on how often people in similar situations have been wrong? Nope. No excuses here, but do you think there are comparable cases of predictions that proved to be reliable? And how much research have you done in this case and about the idea in general?
I was telling you the “ruling out the possibility” is the wrong, (in fact impossible), standard.
I don’t, I actually stated a few times that I do not think that the idea is wrong.
Seeking out just examples that weaken my argument, when I never predicted that no such examples would exist, is the problem I am talking about.
My reason to weaken your argument is not that I want to be right but that I want feedback about my doubts. I said that 1.) people can be wrong, regardless of their previous reputation, 2.) that people can lie about their objectives and deceive by how they act in public (especially when the stakes are high), 3.) that Yudkowsky’s previous output and achievements are not remarkable enough to trust him about some extraordinary claim. You haven’t responded on why you tell people to believe Yudkowsky, in this case, regardless of my objections.
What made you think that supporting your conclusion and weakening my argument are different things?
I’m sorry if I made it appear as if I hold some particular belief. My epistemic state simply doesn’t allow me to arrive at your conclusion. To highlight this I argued in favor of what it would mean to not accept your argument, namely to stand to previously well-established concepts like free speech and transparency. Yes, you could say that there is no difference here, except that I do not care about who is right but what is the right thing to do.
people can be wrong, regardless of their previous reputation
Still, it’s incorrect to argue from existence of examples. You have to argue from likelihood. You’d expect more correctness from a person with reputation for being right than from a person with reputation for being wrong.
People can also go crazy, regardless of their previous reputation, but it’s improbable, and not an adequate argument for their craziness.
And you need to know what fact you are trying to convince people about, not just search for soldier-arguments pointing in the preferred direction. If you believe that the fact is that a person is crazy, you too have to recognize that “people can be crazy” is inadequate argument for this fact you wish to communicate, and that you shouldn’t name this argument in good faith.
(Craziness is introduced as a less-likely condition than wrongness to stress the structure of my argument, not to suggest that wrongness is as unlikely.)
I said that 1.) people can be wrong, regardless of their previous reputation, 2.) that people can lie about their objectives and deceive by how they act in public (especially when the stakes are high), 3.) that Yudkowsky’s previous output and achievements are not remarkable enough to trust him about some extraordinary claim.
I notice that Yudkowsky wasn’t always self-professed human-friendly. Consider this:
I must warn my reader that my first allegiance is to the Singularity, not humanity. I don’t know what the Singularity will do with us. I don’t know whether Singularities upgrade mortal races, or disassemble us for spare atoms. While possible, I will balance the interests of mortality and Singularity. But if it comes down to Us or Them, I’m with Them. You have been warned.
He’s changed his mind since. That makes it far, far less scary.
He has changed his mind about one technical point in meta-ethics. He now realizes that super-human intelligence does not automatically lead to super-human morality. He is now (IMHO) less wrong. But he retains a host of other (mis)conceptions about meta-ethics which make his intentions abhorrent to people with different (mis)conceptions. And he retains the arrogance that would make him dangerous to those he disagrees with, if he were powerful.
″… far, far less scary”? You are engaging in wishful thinking no less foolish than that for which Eliezer has now repented.
He is now (IMHO) less wrong. But he retains a host of other (mis)conceptions about meta-ethics which make his intentions abhorrent to people with different (mis)conceptions.
I’m not at all sure that I agree with Eliezer about most meta-ethics, and definitely disagree on some fairly important issues. But, that doesn’t make his views necessarily abhorrent. If Eliezer triggers a positive Singularity (positive in the sense that it reflects what he wants out of a Singularity, complete with CEV), I suspect that that will be a universe which I won’t mind living in. People can disagree about very basic issues and still not hate each others’ intentions. They can even disagree about long-term goals and not hate it if the other person’s goals are implemented.
If Eliezer triggers a positive Singularity (positive in the sense that it reflects what he wants out of a Singularity, complete with CEV), I suspect that that will be a universe which I won’t mind living in.
Have you ever have one of those arguments with your SO in which:
It is conceded that your intentions were good.
It is conceded that the results seem good.
The SO is still pissed because of the lack of consultation and/or presence of extrapolation?
I usually escape those confrontations by promising to consult and/or not extrapolate the next time. In your scenario, Eliezer won’t have that option.
When people point out that Eliezer’s math is broken because his undiscounted future utilities leads to unbounded utility, his response is something like “Find better math—discounted utility is morally wrong”.
When Eliezer suggests that there is no path to a positive singularity which allows for prior consultation with the bulk of mankind, my response is something like “Look harder. Find a path that allows people to feel that they have given their informed consent to both the project and the timetable—anything else is morally wrong.”
ETA: In fact, I would like to see it as a constraint on the meaning of the word “Friendly” that it must not only provide friendly consequences, but also, it must be brought into existence in a friendly way. I suspect that this is one of those problems in which the added constraint actually makes the solution easier to find.
Could you link to where Eliezer says that future utilities should not be discounted? I find that surprising, since uncertainty causes an effect roughly equivalent to discounting.
I would also like to point out that achieving public consensus about whether to launch an AI would take months or years, and that during that time, not only is there a high risk of unfriendly AIs, it is also guaranteed that millions of people will die. Making people feel like they were involved in the decision is emphatically not worth the cost
Could you link to where Eliezer says that future utilities should not be discounted?
He makes the case in this posting. It is a pretty good posting, by the way, in which he also points out some kinds of discounting which he believes are justified. This posting does not purport to be a knock-down argument against discounting future utility—it merely states Eliezer’s reasons for remaining unconvinced that you should discount (and hence for remaining in disagreement with most economic thinkers).
ETA: One economic thinker who disagrees with Eliezer is Robin Hanson. His response to Eliezer’s posting is also well worth reading.
Examples of Eliezer conducting utilitarian reasoning about the future without discounting are legion.
I find that surprising, since uncertainty causes an effect roughly equivalent to discounting.
Tim Tyler makes the same assertion about the effects of uncertainty. He backs the assertion with metaphor, but I have yet to see a worked example of the math. Can you provide one?
Of course, one obvious related phenomenon—it is even mentioned with respect in Eliezer’s posting—is that the value of a promise must be discounted with time due to the increasing risk of non-performance: my promise to scratch your back tomorrow is more valuable to you than my promise to scratch next week—simply because there is a risk that you or I will die in the interim, rendering the promise worthless. But I don’t see how other forms of increased uncertainty about the future should have the same (exponential decay) response curve.
achieving public consensus about whether to launch an AI would take months or years,
I find that surprising, since uncertainty causes an effect roughly equivalent to discounting.
Tim Tyler makes the same assertion about the effects of uncertainty. He backs the assertion with metaphor, but I have yet to see a worked example of the math. Can you provide one?
Most tree-pruning heuristics naturally cause an effect like temporal discounting. Resource limits mean that you can’t calculate the whole future tree—so you have to prune. Pruning normally means applying some kind of evaluation function early—to decide which branches to prune. The more you evaluate early, the more you are effectively valuing the near-present.
That is not maths—but hopefully it has a bit more detail than previously.
It doesn’t really address the question. In the A* algorithm the heuristic estimates of the objective function are supposed to be upper bounds on utility, not lower bounds. Furthermore, they are supposed to actually estimate the result of the complete computation—not to represent a partial computation exactly.
Furthermore, they are supposed to actually estimate the result of the complete computation—not to represent a partial computation exactly.
Reality check: a tree of possible futures is pruned at points before the future is completely calculated. Of course it would be nice to apply an evaluation function which represents the results of considering all possible future branches from that point on. However, getting one of those that produces results in a reasonable time would be a major miracle.
If you look at things like chess algorithms, they do some things to get a more accurate utility valuation when pruning—such as check for quiescence. However, they basically just employ a standard evaluation at that point—or sometimes a faster, cheaper approximation. If is sufficiently bad, the tree gets pruned.
However, getting one of those would be a major miracle.
We are living in the same reality. But the heuristic evaluation function still needs to be an estimate of the complete computation, rather than being something else entirely. If you want to estimate your own accumulation of pleasure over a lifetime, you cannot get an estimate of that by simply calculating the accumulation of pleasure over a shorter period—otherwise no one would undertake the pain of schooling motivated by the anticipated pleasure of high future income.
The question which divides us is whether an extra 10 utils now is better or worse than an additional 11 utils 20 years from now. You claim that it is worse. Period. I claim that it may well be better, depending on the discount rate.
I’m not sure I understand the question. What does it mean for a util to be ‘timeless’?
ETA: The question of the interaction of utility and time is a confusing one. In “Against Discount Rates”, Eliezer writes:
The idea that it is literally, fundamentally 5% more important that a poverty-stricken family have clean water in 2008, than that a similar family have clean water in 2009, seems like pure discrimination to me—just as much as if you were to discriminate between blacks and whites.
I think that Eliezer has expressed the issue in almost, but not quite, the right way. The right question is whether a decision maker in 2007 should be 5% more interested in doing something about the 2008 issue than about the 2009 issue. I believe that she should be. If only because she expects that she will have an entire year in the future to worry about the 2009 family without the need to even consider 2008 again. 2008′s water will be already under the bridge.
I’m sure someone else can explain this better than me, but: As I understand it, a util understood timelessly (rather than like money, which there are valid reasons to discount because it can be invested, lost, revalued, etc. over time) builds into how it’s counted all preferences, including preferences that interact with time. If you get 10 utils, you get 10 utils, full stop. These aren’t delivered to your door in a plain brown wrapper such that you can put them in an interest-bearing account. They’re improvements in the four-dimensional state of the entire universe over all time, that you value at 10 utils. If you get 11 utils, you get 11 utils, and it doesn’t really matter when you get them. Sure, if you get them 20 years from now, then they don’t cover specific events over the next 20 years that could stand improvement. But it’s still worth eleven utils, not ten. If you value things that happen in the next 20 years more highly than things that happen later, then utils according to your utility function will reflect that, that’s all.
That (timeless utils) is a perfectly sensible convention about what utility ought to mean. But, having adopted that convention, we are left with (at least) two questions:
Do I (in 2011) derive a few percent more utility from an African family having clean water in 2012 than I do from an equivalent family having clean water in 2013?
If I do derive more utility from the first alternative, am I making a moral error in having a utility function that acts that way?
I would answer yes to the first question. As I understand it, Eliezer would answer yes to the second question and would answer no to the first, were he in my shoes. I would claim that Eliezer is making a moral error in both judgments.
Do I (in 2011) derive a few percent more utility from an African family having clean water in 2012 than I do from an equivalent family having clean water in 2013?
Do you (in the years 2011, 2012, 2013, 2014) derive different relative utilities for these conditions? If so, it seems you have a problem.
I’m sorry. I don’t know what is meant by utility derived in 2014 from an event in 2012. I understand that the whole point of my assigning utilities in 2014 is to guide myself in making decisions in 2014. But no decision I make in 2014 can have an effect on events in 2012. So, from a decision-theoretic viewpoint, it doesn’t matter how I evaluate the utilities of past events. They are additive constants (same in all decision branches) in any computation of utility, and hence are irrelevant.
Or did you mean to ask about different relative utilities in the years before 2012? Yes, I understand that if I don’t use exponential discounting, then I risk inconsistencies.
The right question is whether a decision maker in 2007 should be 5% more interested in doing something about the 2008 issue than about the 2009 issue.
And that is a fact about 2007 decision maker, not 2008 family’s value as compared to 2009 family.
If, in 2007, you present me with a choice of clean water for a family for all of and only 2008 vs 2009, and you further assure me that these families will otherwise survive in hardship, and that their suffering in one year won’t materially affect their next year, and that I won’t have this opportunity again come this time next year, and that flow-on or snowball effects which benefit from an early start are not a factor here—then I would be indifferent to the choice.
If I would not be; if there is something intrinsic about earlier times that makes them more valuable, and not just a heuristic of preferring them for snowballing or flow-on reasons, then that is what Eliezer is saying seems wrong.
The right question is whether a decision maker in 2007 should be 5% more interested in doing something about the 2008 issue than about the 2009 issue. I believe that she should be. If only because she expects that she will have an entire year in the future to worry about the 2009 family without the need to even consider 2008 again. 2008′s water will be already under the bridge.
I would classify that as instrumental discounting. I don’t think anyone would argue with that—except maybe a superintelligence who has already exhausted the whole game tree—and for whom an extra year buys nothing.
FWIW, I genuinely don’t understand your perspective. The extent to which you discount the future depends on your chances of enjoying it—but also on factors like your ability to predict it—and your ability to influence it—the latter are functions of your abilities, of what you are trying to predict and of the current circumstances.
You really, really do not normally want to put those sorts of things into an agent’s utility function. You really, really do want to calculate them dynamically, depending on the agent’s current circumstances, prediction ability levels, actuator power levels, previous experience, etc.
Attempts to put that sort of thing into the utility function would normally tend to produce an inflexible agent, who has more difficulties in adapting and improving. Trying to incorporate all the dynamic learning needed to deal with the issue into the utility function might be possible in principle—but that represents a really bad idea.
Hopefully you can see my reasoning on this issue. I can’t see your reasoning, though. I can barely even imagine what it might possibly be.
Maybe you are thinking that all events have roughly the same level of unpredictability in the future, and there is roughly the same level of difficulty in influencing them, so the whole issue can be dealt with by one (or a small number of) temporal discounting “fudge factors”—and that evoution built us that way because it was too stupid to do any better.
You apparently denied that resource limitation results in temporal discounting. Maybe that is the problem (if so, see my other reply here). However, now you seem to have acknowledged that an extra year of time to worry in helps with developing plans. What I can see doesn’t seem to make very much sense.
You really, really do not normally want to put those sorts of things into an agent’s utility function.
I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental. So every time I offer a heuristic analogy explaining the motivation for fundamental discounting, you interpret it as a flawed argument for using discounting as a heuristic for instrumental reasons.
Since it appears that this will go on forever, and I don’t discount the future enough to make the sum of this projected infinite stream of disutility seem small, I really ought to give up. But somehow, my residual uncertainty about the future makes me think that you may eventually take Cromwell’s advice.
You really, really do not normally want to put those sorts of things into an agent’s utility function.
I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental.
To clarify: I do not think the only justification for discounting is instrumental. My position is more like: agents can have whatever utility functions they like (including ones with temporal discounting) without having to justify them to anyone.
However, I do think there are some problems associated with temporal discounting. Temporal discounting sacrifices the future for the sake of the present. Sometimes the future can look after itself—but sacrificing the future is also something which can be taken too far.
Axelrod suggested that when the shadow of the future grows too short, more defections happen. If people don’t sufficiently value the future, reciprocal altruism breaks down. Things get especially bad when politicians fail to value the future. We should strive to arrange things so that the future doesn’t get discounted too much.
Instrumental temporal discounting doesn’t belong in ultimate utility functions. So, we should figure out what temporal discounting is instrumental and exclude it.
If we are building a potentially-immortal machine intelligence with a low chance of dying and which doesn’t age, those are more causes of temporal discounting which could be discarded as well.
What does that leave? Not very much, IMO. The machine will still have some finite chance of being hit by a large celestial body for a while. It might die—but its chances of dying vary over time; its degree of temporal discounting should vary in response—once again, you don’t wire this in, you let the agent figure it out dynamically.
But the heuristic evaluation function still needs to be an estimate of the complete computation, rather than being something else entirely. If you want to estimate your own accumulation of pleasure over a lifetime, you cannot get an estimate of that by simply calculating the accumulation of pleasure over a shorter period—otherwise no one would undertake the pain of schooling motivated by the anticipated pleasure of high future income.
The point is that resource limitation makes these estimates bad estimates—and you can’t do better by replacing them with better estimates because of … resource limitation!
To see how resource limitation leads to temporal discounting, consider computer chess. Powerful computers play reasonable games—but heavily resource limited ones fall for sacrifice plays, and fail to make successful sacrifice gambits. They often behave as though they are valuing short-term gain over long term results.
A peek under the hood quickly reveals why. They only bother looking at a tiny section of the game tree near to the current position! More powerful programs can afford to exhaustively search that space—and then move on to positions further out. Also the limited programs employ “cheap” evaluation functions that fail to fully compensate for their short-term foresight—since they must be able to be executed rapidly. The result is short-sighted chess programs.
That resource limitation leads to temporal discounting is a fairly simple and general principle which applies to all kinds of agents.
To see how resource limitation leads to temporal discounting, consider computer chess.
Why do you keep trying to argue against discounting using an example where discounting is inappropriate by definition? The objective in chess is to win. It doesn’t matter whether you win in 5 moves or 50 moves. There is no discounting. Looking at this example tells us nothing about whether we should discount future increments of utility in creating a utility function.
Instead, you need to look at questions like this: An agent plays go in a coffee shop. He has the choice of playing slowly, in which case the games each take an hour and he wins 70% of them. Or, he can play quickly, in which case the games each take 20 minutes, but he only wins 60% of them. As soon as one game finishes, another begins. The agent plans to keep playing go forever. He gains 1 util each time he wins and loses 1 util each time he loses.
The main decision he faces is whether he maximizes utility by playing slowly or quickly. Of course, he has infinite expected utility however he plays. You can redefine the objective to be maximizing utility flow per hour and still get a ‘rational’ solution. But this trick isn’t enough for the following extended problem:
The local professional offers go lessons. Lessons require a week of time away from the coffee-shop and a 50 util payment. But each week of lessons turns 1% of your losses into victories. Now the question is: Is it worth it to take lessons? How many weeks of lessons are optimal? The difficulty here is that we need to compare the values of a one-shot (50 utils plus a week not playing go) with the value of an eternal continuous flow (the extra fraction of games per hour which are victories rather than losses). But that is an infinite utility payoff from the lessons, and only a finite cost, right? Obviously, the right decision is to take a week of lessons. And then another week after that. And so on. Forever.
Discounting of future utility flows is the standard and obvious way of avoiding this kind of problem and paradox. But now let us see whether we can alter this example to capture your ‘instrumental discounting due to an uncertain future’:
First, the obvious one. Our hero expects to die someday, but doesn’t know when. He estimates a 5% chance of death every year. If he is lucky, he could live for another century. Or he could keel over tomorrow. And when he dies, the flow of utility from playing go ceases. It is very well known that this kind of uncertainty about the future is mathematically equivalent to discounted utility in a certain future. But you seemed to be suggesting something more like the following:
Our hero is no longer certain what his winning percentage will be in the future. He knows that he experiences microstrokes roughly every 6 months, and that each incident takes 5% of his wins and changes them to losses. On the other hand, he also knows that roughly every year he experiences a conceptual breakthrough. And that each such breakthrough takes 10% of his losses and turns them into victories.
Does this kind of uncertainty about the future justify discounting on ‘instrumental grounds’? My intuition says ’No, not in this case, but there are similar cases in which discounting would work.” I haven’t actually done the math, though, so I remain open to instruction.
Why do you keep trying to argue against discounting using an example where discounting is inappropriate by definition? The objective in chess is to win. It doesn’t matter whether you win in 5 moves or 50 moves. There is no discounting. Looking at this example tells us nothing about whether we should discount future increments of utility in creating a utility function.
Temporal discounting is about valuing something happening today more than the same thing happening tomorrow.
Chess computers do, in fact discount. That is why they do prefer to mate you in twenty moves rather than a hundred.
The values of a chess computer do not just tell it to win. In fact, they are complex—e.g. Deep Blue had an evaluation function that was split into 8,000 parts.
Operation consists of maximising the utility function, after foresight and tree pruning. Events that take place in branches after tree pruning has truncated them typically don’t get valued at all—since they are not forseen. Resource-limited chess computers can find themselves preferring to promote a pawn sooner rather than later. They do so since they fail to see the benefit of sequences leading to promotion later.
So: we apparently agree that resource limitation leads to indifference towards the future (due to not bothering to predict it) - but I classify this as a kind of temporal discounting (since rewards in the future get ignored), wheras you apparently don’t.
Hmm. It seems as though this has turned out to be a rather esoteric technical question about exactly which set of phenomena the term “temporal discounting” can be used to refer to.
Earlier we were talking about whether agents focussed their attention on tomorrow—rather than next year. Putting aside the issue of whether that is classified as being “temporal discounting”—or not—I think the extent to which agents focus on the near-future is partly a consequence of resource limitation. Give the agents greater abilities and more resources and they become more future-oriented.
we apparently agree that resource limitation leads to indifference towards the future (due to not bothering to predict it)
No, I have not agreed to that. I disagree with almost every part of it.
In particular, I think that the question of whether (and how much) one cares about the future is completely prior to questions about deciding how to act so as to maximize the things one cares about. In fact, I thought you were emphatically making exactly this point on another branch.
But that is fundamental ‘indifference’ (which I thought we had agreed cannot flow from instrumental considerations). I suppose you must be talking about some kind of instrumental or ‘derived’ indifference. But I still disagree. One does not derive indifference from not bothering to predict—one instead derives not bothering to predict from being indifferent.
Furthermore, I don’t respond to expected computronium shortages by truncating my computations. Instead, I switch to an algorithm which produces less accurate computations at lower computronium costs.
but I classify this as a kind of temporal discounting (since rewards in the future get ignored), wheras you apparently don’t.
And finally, regarding classification, you seem to suggest that you view truncation of the future as just one form of discounting, whereas I choose not to. And that this makes our disagreement a quibble over semantics. To which I can only reply: Please go away Tim.
Furthermore, I don’t respond to expected computronium shortages by truncating my computations. Instead, I switch to an algorithm which produces less accurate computations at lower computronium costs.
I think you would reduce how far you look forward if you were interested in using your resources intelligently and efficiently.
If you only have a million cycles per second, you can’t realistically go 150 ply deep into your go game—no matter how much you care about the results after 150 moves. You compromise—limiting both depth and breadth. The reduction in depth inevitably means that you don’t look so far into the future.
A lot of our communication difficulty arises from using different models to guide our intuitions. You keep imagining game-tree evaluation in a game with perfect information (like chess or go). Yes, I understand your point that in this kind of problem, resource shortages are the only cause of uncertainty—that given infinite resources, there is no uncertainty.
I keep imagining problems in which probability is built in, like the coffee-shop-go-player which I sketched recently. In the basic problem, there is no difficulty in computing expected utilities deeper into the future—you solve analytically and then plug in whatever value for t that you want. Even in the more difficult case (with the microstrokes) you can probably come up with an analytic solution. My models just don’t have the property that uncertainty about the future arises from difficulty of computation.
Right. The real world surely contains problems of both sorts. If you have a problem which is dominated by chaos based on quantum events then more resources won’t help. Whereas with many other types of problems more resources do help.
I recognise the existence of problems where more resources don’t help—I figure you probably recognise that there are problems where more resources do help—e.g. the ones we want intelligent machines to help us with.
The real world surely contains problems of both sorts.
Perhaps the real world does. But decision theory doesn’t. The conventional assumption is that a rational agent is logically omniscient. And generalizing decision theory by relaxing that assumption looks like it will be a very difficult problem.
The most charitable interpretation I can make of your argument here is that human agents, being resource limited, imagine that they discount the future. That discounting is a heuristic introduced by evolution to compensate for those resource limitations. I also charitably assume that you are under the misapprehension that if I only understood the argument, I would agree with it. Because if you really realized that I have already heard you, you would stop repeating yourself.
That you will begin listening to my claim that not all discounting is instrumental is more than I can hope for, since you seem to think that my claim is refuted each time you provide an example of what you imagine to be a kind of discounting that can be interpreted as instrumental.
That you will begin listening to my claim that not all discounting is instrumental is more than I can hope for, since you seem to think that my claim is refuted each time you provide an example of what you imagine to be a kind of discounting that can be interpreted as instrumental.
I am pretty sure that I just told you that I do not think that all discounting is instrumental. Here’s what I said:
I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental.
To clarify: I do not think the only justification for discounting is instrumental. My position is more like: agents can have whatever utility functions they like (including ones with temporal discounting) without having to justify them to anyone.
Agents can have many kinds of utility function! That is partly a consequence of there being so many different ways for agents to go wrong.
Being rational isn’t about your values, you can rationally pursue practially any goal. Epistemic rationality is a bit different—but I mosly ignore that as being unbiological.
Being moral isn’t really much of a constraint at all. Morality—and right and wrong—are normally with respect to a moral system—and unless a moral system is clearly specified, you can often argue all day about what is moral and what isn’t. Maybe some types of morality are more common than others—due to being favoured by the universe, or something like that—but any such context would need to be made plain in the discussion.
So, it seems (relatively) easy to make a temporal discounting agent that really values the present over the future—just stick a term for that in its ultimate values.
Are there any animals with ultimate temporal discounting? That is tricky, but it isn’t difficult to imagine natural selection hacking together animals that way. So: probably, yes.
Do I use ultimate temporal discounting? Not noticably—as far as I can tell. I care about the present more than the future, but my temporal discounting all looks instrumental to me. I don’t go in much for thinking about saving distant galaxies, though! I hope that further clarifies.
I should probably review around about now. Instead of that: IIRC, you want to wire temporal discounting into machines, so their preferences better match your own—whereas I tend to think that would be giving them your own nasty hangover.
The real world surely contains problems of both sorts.
Perhaps the real world does. But decision theory doesn’t. The conventional assumption is that a rational agent is logically omniscient. And generalizing decision theory by relaxing that assumption looks like it will be a very difficult problem.
Programs make good models. If you can program it, you have a model of it. We can actually program agents that make resource-limited decisions. Having an actual program that makes decisions is a pretty good way of modeling making resource-limited decisions.
Perhaps we have some kind of underlying disagreement about what it means for temporal discounting to be “instrumental”.
In your example of an agent with suffering from risk of death, my thinking is: this player might opt for a safer life—with reduced risk. Or they might choose to lead a more interesting but more risky life. Their degree of discounting may well adjust itself accordingly—and if so, I would take that as evidence that their discounting was not really part of their pure preferences, but rather was an instrumental and dynamic response to the observed risk of dying.
If—on the other hand—they adjusted the risk level of their lifestyle, and their level of temporal discounting remained unchanged, that would be cofirming evidence in favour of the hypothesis that their temporal discounting was an innate part of their ultimate preferences—and not instrumental.
Of course. My point is that observing if the discount rate changes with the risk tells you if the agent is rational or irrational, not if the discount rate is all instrumental or partially terminal.
Stepping back for a moment, terminal values represent what the agent really wants, and instrumental values are things sought en-route.
The idea I was trying to express was: if what an agent really wants is not temporally discounted, then instrumental temporal discounting will produce a predictable temporal discounting curve—caused by aging, mortality risk, uncertainty, etc.
Deviations from that curve would indicate the presence of terminal temporal discounting.
I have no disagreement at all with your analysis here. This is not fundamental discounting. And if you have decision alternatives which affect the chances of dying, then it doesn’t even work to model it as if it were fundamental.
You recently mentioned the possibility of dying in the interim. There’s also the possibility of aging in the interim. Such factors can affect utility calculations.
For example: I would much rather have my grandmother’s inheritance now than years down the line, when she finally falls over one last time—because I am younger and fitter now.
Significant temporal discounting makes sense sometimes—for example, if there is a substantial chance of extinction per unit time. I do think a lot of discounting is instrumental, though—rather than being a reflection of ultimate values—due to things like the future being expensive to predict and hard to influence.
My brain spends more time thinking about tomorrow than about this time next year—because I am more confident about what is going on tomorrow, and am better placed to influence it by developing cached actions, etc. Next year will be important too—but there will be a day before to allow me to prepare for it closer to the time, when I am better placed to do so. The difference is not because I will be older then—or because I might die in the mean time. It is due to instrumental factors.
Of course one reason this is of interest is because we want to know what values to program into a superintelligence. That superintelligence will probably not age—and will stand a relatively low chance of extinction per unit time. I figure its ultimate utility function should have very little temporal discounting.
The problem with wiring discount functions into the agent’s ultimate utility function is that that is what you want it to preserve as it self improves. Much discounting is actually due to resource limitation issues. It makes sense for such discounting to be dynamically reduced as more resources become cheaply available. It doesn’t make much sense to wire-in short-sightedness.
I don’t mind tree-pruning algorithms attempting to normalise partial evaluations at different times—so they are more directly comparable to each other. The process should not get too expensive, though—the point of tree pruning is that it is an economy measure.
Find a path that allows people to feel that they have given their informed consent to both the project and the timetable—anything else is morally wrong.
I suspect you want to replace “feel like they have given” with “give.”
Unless you are actually claiming that what is immoral is to make people fail to feel consulted, rather than to fail to consult them, which doesn’t sound like what you’re saying.
Find a path that allows people to feel that they have given their informed consent to both the project and the timetable—anything else is morally wrong.
I suspect you want to replace “feel like they have given” with “give.”
I think I will go with a simple tense change: “feel that they are giving”. Assent is far more important in the lead-up to the Singularity than during the aftermath.
Although I used the language “morally wrong”, my reason for that was mostly to make the rhetorical construction parallel. My preference for an open, inclusive process is a strong preference, but it is really more political/practical than moral/idealistic. One ought to allow the horses to approach the trough of political participation, if only to avoid being trampled, but one is not morally required to teach them how to drink.
Ah, I see. Sure, if you don’t mean morally wrong but rather politically impractical, then I withdraw my suggestion… I entirely misunderstood your point.
No, I did originally say (and mostly mean) “morally” rather than “politically”. And I should thank you for inducing me to climb down from that high horse.
But he retains a host of other (mis)conceptions about meta-ethics which make his intentions abhorrent to people with different (mis)conceptions.
I submit that I have many of the same misconceptions that Eliezer does; he changed his mind about one of the few places I disagree with him. That makes it far more of a change than it would be for you (one out of eight is a small portion, one out of a thousand is an invisible fraction).
Good point. And since ‘scary’ is very much a subjective judgment, that mean that I can’t validly criticize you for being foolish unless I have some way of arguing that yours and Eliezer’s positions in the realm of meta-ethics are misconceptions—something I don’t claim to be able to do.
So, if I wish my criticisms to be objective, I need to modify them. Eliezer’s expressed positions on meta-ethics (particularly his apparent acceptance of act-utilitarianism and his unwillingness to discount future utilities) together with some of his beliefs regarding the future (particularly his belief in the likelihood of a positive singularity and expansion of human population into the universe) make his ethical judgments completely unpredictable to many other people—unpredictable because the judgment may turn on subtle differences in the expect consequences of present day actions on people in the distant future. And, if one considers the moral judgments of another personal to be unpredictable, and that person is powerful, then one ought to consider that person scary. Eliezer is probably scary to many people.
True, but it has little bearing on whether Eliezer should be scary. That is, “Eliezer is scary to many people” is mostly a fact about many people, and mostly not a fact about Eliezer. The reverse of this (and what I base this distinction on) is that some politicians should be scary, and are not scary to many people.
I’m not sure the proposed modification helps: you seem to have expanded your criticism so far, in order to have them lead to the judgment you want to reach, that they cover too much.
I mean, sure, unpredictability is scarier (for a given level of power) than predictability. Agreed, But so what?
For example, my judgments will always be more unpredictable to people much stupider than I am than to people about as smart or smarter than I am. So the smarter I am, the scarier I am (again, given fixed power)… or, rather, the more people I am scary to… as long as I’m not actively devoting effort to alleviating those fears by, for example, publicly conforming to current fashions of thought. Agreed.
But what follows from that? That I should be less smart? That I should conform more? That I actually represent a danger to more people? I can’t see why I should believe any of those things.
You started out talking about what makes one dangerous; you have ended up talking about what makes people scared of one whether one is dangerous or not. They aren’t equivalent.
you seem to have expanded your criticism so far, in order to have them lead to the judgment you want to reach, that they cover too much.
Well, I hope I haven’t done that.
You started out talking about what makes one dangerous; you have ended up talking about what makes people scared of one whether one is dangerous or not.
Well, I certainly did that. I was trying to address the question more objectively, but it seems I failed. Let me try again from a more subjective, personal position.
If you and I share the same consequentialist values, but I know that you are more intelligent, I may well consider you unpredictable, but I won’t consider you dangerous. I will be confident that your judgments, in pursuit of our shared values, will be at least as good as my own. Your actions may surprise me, but I will usually be pleasantly surprised.
If you and I are of the same intelligence, but we have different consequentialist values (both being egoists, with disjoint egos, for example) then we can expect to disagree on many actions. Expecting the disagreement, we can defend ourselves, or even bargain our way to a Nash bargaining solution in which (to the extent that we can enforce our bargain) we can predict each others behavior to be that promoting compromise consequences.
If, in addition to different values, we also have different beliefs, then bargaining is still possible, though we cannot expect to reach a Pareto optimal bargain. But the more our beliefs diverge, regarding consequences that concern us, the less good our bargains can be. In the limit, when the things that matter to us are particularly difficult to predict, and when we each have no idea what the other agent is predicting, bargaining simply becomes ineffective.
Eliezer has expressed his acceptance of the moral significance of the utility functions of people in the far distant future. Since he believes that those people outnumber us folk in the present, that seems to suggest that he would be willing to sacrifice the current utility of us in favor of the future utility of them. (For example, the positive value of saving a starving child today does not outweigh the negative consequences on the multitudes of the future of delaying the Singularity by one day).
I, on the other hand, systematically discount the future. That, by itself, does not make Eliezer dangerous to me. We could strike a Nash bargain, after all. However, we inevitably also have different beliefs about consequences, and the divergence between our beliefs becomes greater the farther into the future we look. And consequences in the distant future are essentially all that matters to people like Eliezer—the present fades into insignificance by contrast. But, to people like me, the present and near future are essentially all that matter—the distant future discounts into insignificance.
So, Eliezer and I care about different things. Eliezer has some ability to predict my actions because he knows I care about short-term consequences and he knows something about how I predict short-term consequences. But I have little ability to predict Eliezer’s actions, because I know he cares primarily about long term consequences, and they are inherently much more unpredictable. I really have very little justification for modeling Eliezer (and any other act utilitarian who refuses to discount the future) as a rational agent.
I really have very little justification for modeling Eliezer (and any other act utilitarian who refuses to discount the future) as a rational agent.
I wish you would just pretend that they care about things a million times further into the future than you do.
The reason is that there are instrumental reasons to discount—the future disappears into a fog of uncertainty—and you can’t make decisions based on the value of things you can’t forsee.
The instrumental reasons fairly quickly dominate as you look further out—even when you don’t discount in your values. Reading your post, it seems as though you don’t “get” this, or don’t agree with it—or something.
Yes, the far-future is unpredictable—but in decision theory, that tends to make it a uniform grey—not an unpredictable black and white strobing pattern.
I wish you would just pretend that they care about things a million times further into the future than you do.
I don’t need to pretend. Modulo some mathematical details, it is the simple truth. And I don’t think there is anything irrational about having such preferences. It is just that, since I cannot tell whether or not what I do will make such people happy, I have no motive to pay any attention to their preferences.
Yes, the far-future is unpredictable—but in decision theory, that tends to make it a uniform grey—not an unpredictable black and white strobing pattern.
Yet, it seems that the people who care about the future do not agree with you on that. Bostrom, Yudkowsky, Nesov, et al. frequently invoke assessments of far-future consequences (sometimes in distant galaxies) in justifying their recommendations.
I wish you would just pretend that they care about things a million times further into the future than you do.
I don’t need to pretend. Modulo some mathematical details, it is the simple truth.
We have crossed wires here. What I meant is that I wish you would stop protesting about infinite utilities—and how non-discounters are not really even rational agents—and just model them as ordinary agents who discount a lot less than you do.
Objections about infinity strike me as irrelevant and uninteresting.
It is just that, since I cannot tell whether or not what I do will make such people happy, I have no motive to pay any attention to their preferences.
Is that your true objection? I expect you can figure out what would make these people happy fairly easily enough most of the time—e.g. by asking them.
Yes, the far-future is unpredictable—but in decision theory, that tends to make it a uniform grey—not an unpredictable black and white strobing pattern.
Yet, it seems that the people who care about the future do not agree with you on that. Bostrom, Yudkowsky, Nesov, et al. frequently invoke assessments of far-future consequences (sometimes in distant galaxies) in justifying their recommendations.
Indeed. That is partly poetry, though (big numbers make things seem important) - and partly because they think that the far future will be highly contingent on near future events.
The thing they are actually interested in influencing is mostly only a decade or so out. It does seem quite important—significant enough to reach back to us here anyway.
If what you are trying to understand is far enough away to be difficult to predict, and very important, then that might cause some oscillations. That is hardly a common situation, though.
Most of the time, organisms act as though want to become ancestors. To do that,
the best thing they can do is focus on having some grandkids. Expanding their circle of care out a few generations usually makes precious little difference to their actions. The far future is unforseen, and usually can’t be directly influenced. It is usually not too relevant. Usually, you leave it to your kids to deal with.
It is just that, since I cannot tell whether or not what I do will make such people happy, I have no motive to pay any attention to their preferences.
Is that your true objection? I expect you can figure out what would make these people happy fairly easily enough most of the time—e.g. by asking them.
That is a valid point. So, I am justified in treating them as rational agents to the extent that I can engage in trade with them. I just can’t enter into a long-term Nash bargain with them in which we jointly pledge to maximize some linear combination of our two utility functions in an unsupervised fashion. They can’t trust me to do what they want, and I can’t trust them to judge their own utility as bounded.
I think this is back to the point about infinities. The one I wish you would stop bringing up—and instead treat these folk as though they are discounting only a teeny, tiny bit.
Frankly, I generally find it hard to take these utilitarian types seriously in the first place. A “signalling” theory (holier-than-thou) explains the unusually high prevalance of utilitarianism among moral philosophers—and an “exploitation” theory explains its prevalance among those running charitable causes (utilitarianism-says-give-us-your-money). Those explanations do a good job of modelling the facts about utilitarianism—and are normally a lot more credible than the supplied justifications—IMHO.
I think this is back to the point about infinities.
Which suggests that we are failing to communicate. I am not surprised.
The one I wish you would stop bringing up—and instead treat these folk as though they are discounting only a teeny, tiny bit.
I do that! And I still discover that their utility functions are dominated by huge positive and negative utilities in the distant future, while mine are dominated by modest positive and negative utilities in the near future. They are still wrong even if they fudge it so that their math works.
I think this is back to the point about infinities.
Which suggests that we are failing to communicate. I am not surprised.
I went from your “I can’t trust them to judge their own utility as bounded” to your earlier “infinity” point. Possibly I am not trying very hard here, though...
My main issue was you apparently thinking that you couldn’t predict their desires in order to find mutually beneficial trades. I’m not really sure if this business about not being able to agree to maximise some shared function is a big deal for you.
Mm. OK, so you are talking about scaring sufficiently intelligent rationalists, not scaring the general public. Fair enough.
What you say makes sense as far as it goes, assuming some mechanism for reliable judgments about people’s actual bases for their decisions. (For example, believing their self-reports.)
But it seems the question that should concern you is not whether Eliezer bases his decisions on predictable things, but rather whether Eliezer’s decisions are themselves predictable.
Put a different way: by your own account, the actual long-term consequences don’t correlate reliably with Eliezer’s expectations about them… that’s what it means for those consequences to be inherently unpredictable. And his decisions are based on his expectations, of course, not on the actual future consequences. So it seems to follow that once you know Eliezer’s beliefs about the future, whether those beliefs are right or wrong is irrelevant to you: that just affects what actually happens in the future, which you systematically discount anyway.
So if Eliezer is consistent in his beliefs about the future, and his decisions are consistently grounded in those beliefs, I’m not sure what makes him any less predictable to me than you are.
Of course, his expectations might not be consistent. Or they might be consistent but beyond your ability to predict. Or his decisions might be more arbitrary than you suggest here. For that matter, he might be lying outright. I’m not saying you should necessarily trust him, or anyone else.
But those same concerns apply to everybody, whatever their professed value structure. I would say the same things about myself.
So it seems to follow that once you know Eliezer’s beliefs about the future, whether those beliefs are right or wrong is irrelevant to you: that just affects what actually happens in the future, which you systematically discount anyway.
But Eliezer’s beliefs about the future continue to change—as he gains new information and completes new deductions. And there is no way that he can practically keep me informed of his beliefs—neither he nor I would be willing to invest the time required for that communication. But Eliezer’s beliefs about the future impact his actions in the present, and those actions have consequences both in the near and distant future. From my point of view, therefore, his actions have essentially random effects on the only thing that matters to me—the near future.
Absolutely. But who isn’t that true of? At least Eliezer has extensively documented his putative beliefs at various points in time, which gives you some data points to extrapolate from.
I have no complaints regarding the amount of information about Eliezer’s beliefs that I have access to. My complaint is that Eliezer, and his fellow non-discounting act utilitarians, are morally driven by the huge differences in utility which they see as arising from events in the distant future—events which I consider morally irrelevant because I discount the future. No realistic amount of information about beliefs can alleviate this problem. The only fix is for them to start discounting. (I would have added “or for me to stop discounting” except that I still don’t know how to handle the infinities.)
Given that they predominantly care about things I don’t care about, and that I predominantly care about things they don’t worry about, we can only consider each other to be moral monsters.
You and I seem to be talking past each other now. It may be time to shut this conversation down.
Given that they predominantly care about things I don’t care about, and that I predominantly care about things they don’t worry about, we can only consider each other to be moral monsters.
Ethical egoists are surely used to this situation, though. The world is full of people who care about extremely different things from one another.
Yes. And if they both mostly care about modest-sized predictable things, then they can do some rational bargaining. Trouble arises when one or more of them has exquisitely fragile values—when they believe that switching a donation from one charity to another destroys galaxies.
I expect your decision algorithm will find a way to deal with people who won’t negotiate on some topics—or who behave in manner you have a hard time predicting. Some trouble for you, maybe—but probably not THE END OF THE WORLD.
From my point of view, therefore, his actions have essentially random effects on the only thing that matters to me—the near future.
Looking at the last 10 years, there seems to be some highly-predictable fund raising activity, and a lot of philosophising about the importance of machine morality.
I see some significant patterns there. It is not remotely like a stream of random events. So: what gives?
Sure, the question of whether a superintelligence will construct a superior morality to that which natural selection and cultural evolution have constructed on Earth is in some sense a narrow technical question. (The related question of whether the phrase “superior morality” even means anything is, also.)
But it’s a technical question that pertains pretty directly to the question of whose side one envisions oneself on.
That is, if one answers “yes,” it can make sense to ally with the Singularity rather than humanity (assuming that even means anything) as EY-1998 claims to, and still expect some unspecified good (or perhaps Good) result. Whereas if one answers “no,” or if one rejects the very idea that there’s such a thing as a superior morality, that justification for alliance goes away.
That said, I basically agree with you, though perhaps for different reasons than yours.
That is, even after embracing the idea that no other values, even those held by a superintelligence, can be superior to human values, one is still left with the same choice of alliances. Instead of “side with humanity vs. the Singularity,” the question involves a much narrower subset: “side with humanity vs. FAI-induced Singularity,” but from our perspective it’s a choice among infinities.
Of course, advocates of FAI-induced Singularity will find themselves saying that there is no conflict, really, because an FAI-induced Singularity will express by definition what’s actually important about humanity. (Though, of course, there’s no guarantee that individual humans won’t all be completely horrified by the prospect.)
Though, of course, there’s no guarantee that individual humans won’t all be completely horrified by the prospect.
Recall that after CEV extrapolates current humans’ volitions and construes a coherent superposition, the next step isn’t “do everything that superposition says”, but rather, “ask that superposition the one question ‘Given the world as it is right now, what program should we run next?’, run that program, and then shut down”. I suppose it’s possible that our CEV will produce an AI that immediately does something we find horrifying, but I think our future selves are nicer than that… or could be nicer than that, if extrapolated the right way, so I’d consider it a failure of Friendliness if we get a “do something we’d currently find horrifying for the greater good” AI if a different extrapolation strategy would have resulted in something like a “start with the most agreeable and urgent stuff, and other than that, protect us while we grow up and give us help where we need it” AI.
I really doubt that we’d need an AI to do anything immediately horrifying to the human species in order to allow it to grow up into an awesome fun posthuman civilization, so if CEV 1.0 Beta 1 appeared to be going in that direction, that would probably be considered a bug and fixed.
(shrug) Sure, if you’re right that the “most urgent and agreeable stuff” doesn’t happen to press a significant number of people’s emotional buttons, then it follows that not many people’s emotional buttons will be pressed.
But there’s a big difference between assuming that this will be the case, and considering it a bug if it isn’t.
Either I trust the process we build more than I trust my personal judgments, or I don’t.
If I don’t, then why go through this whole rigamarole in the first place? I should prefer to implement my personal judgments. (Of course, I may not have the power to do so, and prefer to join more powerful coalitions whose judgments are close-enough to mine. But in that case CEV becomes a mere political compromise among the powerful.)
If I do, then it’s not clear to me that “fixing the bug” is a good idea.
That is, OK, suppose we write a seed AI intended to work out humanity’s collective CEV, work out some next-step goals based on that CEV and an understanding of likely consequences, construct a program P to implement those goals, run P, and quit.
Suppose that I am personally horrified by the results of running P. Ought I choose to abort P? Or ought I say to myself “Oh, how interesting: my near-mode emotional reactions to the implications of what humanity really wants are extremely negative. Still, most everybody else seems OK with it. OK, fine: this is not going to be a pleasant transition period for me, but my best guess is still that it will ultimately be for the best.”
Is there some number of people such that if more than that many people are horrified by the results, we ought to choose to abort P?
Does the question even matter? The process as you’ve described it doesn’t include an abort mechanism; whichever choice we make P is executed.
Ought we include such an abort mechanism? It’s not at all clear to me that we should. I can get on a roller-coaster or choose not to get on it, but giving me a brake pedal on a roller coaster is kind of ridiculous.
Sure, the question of whether a superintelligence will construct a superior morality to that which natural selection and cultural evolution have constructed on Earth is in some sense a narrow technical question.
Apparently he changed his mind about a bunch of things.
On what appears to be their current plan, the SIAI, don’t currently look very dangerous, IMHO.
Eray Ozkural recently complained: “I am also worried that backwards people and extremists will threaten us, and try to dissuade us from accomplishing our work, due to your scare tactics.”
I suppose that sort of thing is possible—but my guess is that they are mostly harmless.
(Parenthetical about how changing your mind, admitting you were wrong, oops, etc, is a good thing).
Yes, I agree. I don’t really believe that he only learnt how to disguise his true goals. But I’m curious if you would be satisfied with his word alone if he would be able to run a fooming AI next week only if you gave your OK?
He has; this is made abundantly clear in the Metaethics sequence and particularly the “coming of age” sequence. That passage appears to be a reflection of the big embarrassing mistake he talked about, when he thought that he knew nothing about true morality (se “Could Anything Be Right?”) and that a superintelligence with a sufficiently “unconstrained” goal system (or what he’d currently refer to as “a rock”) would necessarily discover the ultimate true morality, so that whatever this superintelligence ended up doing would necessarily be the right thing, whether that turned out to consist of giving everyone a volcano lair full of catgirls/boys or wiping out humanity and reshaping the galaxy for its own purposes.
Needless to say, that is not his view anymore; there isn’t even any “Us or Them” to speak of anymore. Friendly AIs aren’t (necessarily) people, and certainly won’t be a distinct race of people with their own goals and ambitions.
Yes, I’m not suggesting that he is just signaling all that he wrote in the sequences to persuade people to trust him. I’m just saying that when you consider what people are doing for much less than shaping the whole universe to their liking, one might consider some sort of public or third-party examination before anyone is allowed to launch a fooming AI.
It will probably never come to it anyway. Not because the SIAI is not going to succeed but if it told anyone that it is even close to implementing something like CEV then the whole might of the world would crush it (if the world didn’t turn rational until then). Because to say that you are going to run a fooming AI will be interpreted as trying to take over all power and rule the universe. I suppose this is also the most likely reason for the SIAI to fail. The idea is out and once people notice that fooming AI isn’t just science fiction they will do everything to stop anyone from either implementing one at all or to run their own before anyone else does. And who’ll be the first competitor to take out in the race to take over the universe? The SIAI of course, just search Google. I guess it would have been a better idea to make this a stealth project from day one. But that train has left.
Anyway, if the SIAI does succeed one can only hope that Yudkowsky is not Dr. Evil in disguise. But even that would still be better than a paperclip maximizer. I assign more utility to a universe adjusted to Yudkowsky’s volition (or the SIAI) than paperclips (I suppose even if that means I’ll not “like” what happens to me then).
I’m just saying that when you consider what people are doing for much less than shaping the whole universe to their liking, one might consider some sort of public or third-party examination before anyone is allowed to launch a fooming AI.
I don’t see who is going to enforce that. Probably nobody.
What we are fairly likely to see is open-source projects getting more limelight. It is hard to gather mindshare if your strategy is: trust the code to us. Relatively few programmers are likely to buy into such projects—unless you pay them to do so.
So you take him at his word that he’s working in your best interest. You don’t think it is necessary to supervise the SIAI while working towards friendly AI. But once they finished their work, ready to go, you are in favor of some sort of examination before they can implement it. Is that correct?
I don’t think human selfishness vs. public interest is much of a problem with FAI; everyone’s interests with respect to FAI are well correlated, and making an FAI which specifically favors its creator doesn’t give enough extra benefit over an FAI which treats everyone equally to justify the risks (that the extra term will be discovered, or that the extra term introduces a bug). Not even for a purely selfish creator; FAI scenarios just doesn’t leave enough room for improvement to motivate implementing something else.
On the matter of inspecting AIs before launch, however, I’m conflicted. On one hand, the risk of bugs is very serious, and the only way to mitigate it is to have lots of qualified people look at it closely. On the other hand, if the knowledge that a powerful AI was close to completion became public, it would be subject to meddling by various entities that don’t understand what they’re doing. and it would also become a major target for espionage by groups of questionable motives and sanity who might create UFAIs. These risks are difficult to balance, but I think secrecy is the safer choice, and should be the default.
If your first paragraph turns out to be true, does that change anything with respect to the problem of human and political irrationality? My worry is that even if there is only one rational solution that everyone should favor, how likely is it that people understand and accept this? That might be no problem given the current perception. If the possibility of fooming AI will still be ignored at the point it will be possible to implement friendliness (CEV etc.), then there will be no opposition. So some quick quantum leaps towards AGI will likely allow the SIAI to follow through on it. But my worry is that if the general public or governments notice this possibility and take it serious, it will turn into a political mess never seen before. The world would have to be dramatically different for the big powers to agree on something like CEV. I still think this is the most likely failure mode in case the SIAI succeeds in defining friendliness before someone else runs a fooming AI. Politics.
These risks are difficult to balance, but I think secrecy is the safer choice, and should be the default.
I agree. But is that still possible? After all we’re writing about it in public. Although to my knowledge the SIAI never suggested that it would actually create a fooming AI, only come up with a way to guarantee its friendliness. But what you said in your second paragraph would suggest that the SIAI would also have to implement friendliness or otherwise people will take advantage of it or simply mess it up.
Although to my knowledge the SIAI never suggested that it would actually create a fooming AI, only come up with a way to guarantee its friendliness.
This?
The Singularity Institute was founded on the theory that in order to get a Friendly artificial intelligence, someone has got to build one. So, we’re just going to have an organization whose mission is: build a Friendly AI. That’s us.”
You don’t think it is necessary to supervise the SIAI while working towards friendly AI. But once they finished their work, ready to go, you are in favor of some sort of examination before they can implement it.
Probably it would be easier to run the examination during the SIAI’s work, rather than after. Certainly it would save more lives. So, supervise them, so that your examination is faster and more thorough. I am not in favour of pausing the project, once complete, to examine it if it’s possible to examine in in operation.
I do not seek out examples to support my conclusion but to weaken your argument that one should trust Yudkowsky because of his previous output.
You shouldn’t seek to “weaken an argument”, you should seek what is the actual truth, and then maybe ways of communicating your understanding. (I believe that’s what you intended anyway, but think it’s better not to say it this way, as a protective measure against motivated cognition.)
You have failed to address my criticisms of you points, that you are seeking out only examples that support your desired conclusion, and that you are ignoring details that would allow you to construct a narrower, more relevant reference class for your outside view argument.
I was telling you the “ruling out the possibility” is the wrong, (in fact impossible), standard.
Only now I understand your criticism. I do not seek out examples to support my conclusion but to weaken your argument that one should trust Yudkowsky because of his previous output. I’m aware that Yudkowsky can very well be right about the idea but do in fact believe that the risk is worth taking. Have I done extensive research on how often people in similar situations have been wrong? Nope. No excuses here, but do you think there are comparable cases of predictions that proved to be reliable? And how much research have you done in this case and about the idea in general?
I don’t, I actually stated a few times that I do not think that the idea is wrong.
Seeking out just examples that weaken my argument, when I never predicted that no such examples would exist, is the problem I am talking about.
What made you think that supporting your conclusion and weakening my argument are different things?
My reason to weaken your argument is not that I want to be right but that I want feedback about my doubts. I said that 1.) people can be wrong, regardless of their previous reputation, 2.) that people can lie about their objectives and deceive by how they act in public (especially when the stakes are high), 3.) that Yudkowsky’s previous output and achievements are not remarkable enough to trust him about some extraordinary claim. You haven’t responded on why you tell people to believe Yudkowsky, in this case, regardless of my objections.
I’m sorry if I made it appear as if I hold some particular belief. My epistemic state simply doesn’t allow me to arrive at your conclusion. To highlight this I argued in favor of what it would mean to not accept your argument, namely to stand to previously well-established concepts like free speech and transparency. Yes, you could say that there is no difference here, except that I do not care about who is right but what is the right thing to do.
Still, it’s incorrect to argue from existence of examples. You have to argue from likelihood. You’d expect more correctness from a person with reputation for being right than from a person with reputation for being wrong.
People can also go crazy, regardless of their previous reputation, but it’s improbable, and not an adequate argument for their craziness.
And you need to know what fact you are trying to convince people about, not just search for soldier-arguments pointing in the preferred direction. If you believe that the fact is that a person is crazy, you too have to recognize that “people can be crazy” is inadequate argument for this fact you wish to communicate, and that you shouldn’t name this argument in good faith.
(Craziness is introduced as a less-likely condition than wrongness to stress the structure of my argument, not to suggest that wrongness is as unlikely.)
I notice that Yudkowsky wasn’t always self-professed human-friendly. Consider this:
http://hanson.gmu.edu/vc.html#yudkowsky
Wow. That is scary. Do you have an estimated date on that bizarre declaration? Pre 2004 I assume?
He’s changed his mind since. That makes it far, far less scary.
(Parenthetical about how changing your mind, admitting you were wrong, oops, etc, is a good thing).
(Hence reference to Eliezier2004 sequence.)
He has changed his mind about one technical point in meta-ethics. He now realizes that super-human intelligence does not automatically lead to super-human morality. He is now (IMHO) less wrong. But he retains a host of other (mis)conceptions about meta-ethics which make his intentions abhorrent to people with different (mis)conceptions. And he retains the arrogance that would make him dangerous to those he disagrees with, if he were powerful.
″… far, far less scary”? You are engaging in wishful thinking no less foolish than that for which Eliezer has now repented.
I’m not at all sure that I agree with Eliezer about most meta-ethics, and definitely disagree on some fairly important issues. But, that doesn’t make his views necessarily abhorrent. If Eliezer triggers a positive Singularity (positive in the sense that it reflects what he wants out of a Singularity, complete with CEV), I suspect that that will be a universe which I won’t mind living in. People can disagree about very basic issues and still not hate each others’ intentions. They can even disagree about long-term goals and not hate it if the other person’s goals are implemented.
Have you ever have one of those arguments with your SO in which:
It is conceded that your intentions were good.
It is conceded that the results seem good.
The SO is still pissed because of the lack of consultation and/or presence of extrapolation?
I usually escape those confrontations by promising to consult and/or not extrapolate the next time. In your scenario, Eliezer won’t have that option.
When people point out that Eliezer’s math is broken because his undiscounted future utilities leads to unbounded utility, his response is something like “Find better math—discounted utility is morally wrong”.
When Eliezer suggests that there is no path to a positive singularity which allows for prior consultation with the bulk of mankind, my response is something like “Look harder. Find a path that allows people to feel that they have given their informed consent to both the project and the timetable—anything else is morally wrong.”
ETA: In fact, I would like to see it as a constraint on the meaning of the word “Friendly” that it must not only provide friendly consequences, but also, it must be brought into existence in a friendly way. I suspect that this is one of those problems in which the added constraint actually makes the solution easier to find.
Could you link to where Eliezer says that future utilities should not be discounted? I find that surprising, since uncertainty causes an effect roughly equivalent to discounting.
I would also like to point out that achieving public consensus about whether to launch an AI would take months or years, and that during that time, not only is there a high risk of unfriendly AIs, it is also guaranteed that millions of people will die. Making people feel like they were involved in the decision is emphatically not worth the cost
He makes the case in this posting. It is a pretty good posting, by the way, in which he also points out some kinds of discounting which he believes are justified. This posting does not purport to be a knock-down argument against discounting future utility—it merely states Eliezer’s reasons for remaining unconvinced that you should discount (and hence for remaining in disagreement with most economic thinkers).
ETA: One economic thinker who disagrees with Eliezer is Robin Hanson. His response to Eliezer’s posting is also well worth reading.
Examples of Eliezer conducting utilitarian reasoning about the future without discounting are legion.
Tim Tyler makes the same assertion about the effects of uncertainty. He backs the assertion with metaphor, but I have yet to see a worked example of the math. Can you provide one?
Of course, one obvious related phenomenon—it is even mentioned with respect in Eliezer’s posting—is that the value of a promise must be discounted with time due to the increasing risk of non-performance: my promise to scratch your back tomorrow is more valuable to you than my promise to scratch next week—simply because there is a risk that you or I will die in the interim, rendering the promise worthless. But I don’t see how other forms of increased uncertainty about the future should have the same (exponential decay) response curve.
So, start now.
Most tree-pruning heuristics naturally cause an effect like temporal discounting. Resource limits mean that you can’t calculate the whole future tree—so you have to prune. Pruning normally means applying some kind of evaluation function early—to decide which branches to prune. The more you evaluate early, the more you are effectively valuing the near-present.
That is not maths—but hopefully it has a bit more detail than previously.
It doesn’t really address the question. In the A* algorithm the heuristic estimates of the objective function are supposed to be upper bounds on utility, not lower bounds. Furthermore, they are supposed to actually estimate the result of the complete computation—not to represent a partial computation exactly.
Reality check: a tree of possible futures is pruned at points before the future is completely calculated. Of course it would be nice to apply an evaluation function which represents the results of considering all possible future branches from that point on. However, getting one of those that produces results in a reasonable time would be a major miracle.
If you look at things like chess algorithms, they do some things to get a more accurate utility valuation when pruning—such as check for quiescence. However, they basically just employ a standard evaluation at that point—or sometimes a faster, cheaper approximation. If is sufficiently bad, the tree gets pruned.
We are living in the same reality. But the heuristic evaluation function still needs to be an estimate of the complete computation, rather than being something else entirely. If you want to estimate your own accumulation of pleasure over a lifetime, you cannot get an estimate of that by simply calculating the accumulation of pleasure over a shorter period—otherwise no one would undertake the pain of schooling motivated by the anticipated pleasure of high future income.
The question which divides us is whether an extra 10 utils now is better or worse than an additional 11 utils 20 years from now. You claim that it is worse. Period. I claim that it may well be better, depending on the discount rate.
Correct me if I’m missing an important nuance, but isn’t this just about whether one’s utils are timeless?
I’m not sure I understand the question. What does it mean for a util to be ‘timeless’?
ETA: The question of the interaction of utility and time is a confusing one. In “Against Discount Rates”, Eliezer writes:
I think that Eliezer has expressed the issue in almost, but not quite, the right way. The right question is whether a decision maker in 2007 should be 5% more interested in doing something about the 2008 issue than about the 2009 issue. I believe that she should be. If only because she expects that she will have an entire year in the future to worry about the 2009 family without the need to even consider 2008 again. 2008′s water will be already under the bridge.
I’m sure someone else can explain this better than me, but: As I understand it, a util understood timelessly (rather than like money, which there are valid reasons to discount because it can be invested, lost, revalued, etc. over time) builds into how it’s counted all preferences, including preferences that interact with time. If you get 10 utils, you get 10 utils, full stop. These aren’t delivered to your door in a plain brown wrapper such that you can put them in an interest-bearing account. They’re improvements in the four-dimensional state of the entire universe over all time, that you value at 10 utils. If you get 11 utils, you get 11 utils, and it doesn’t really matter when you get them. Sure, if you get them 20 years from now, then they don’t cover specific events over the next 20 years that could stand improvement. But it’s still worth eleven utils, not ten. If you value things that happen in the next 20 years more highly than things that happen later, then utils according to your utility function will reflect that, that’s all.
That (timeless utils) is a perfectly sensible convention about what utility ought to mean. But, having adopted that convention, we are left with (at least) two questions:
Do I (in 2011) derive a few percent more utility from an African family having clean water in 2012 than I do from an equivalent family having clean water in 2013?
If I do derive more utility from the first alternative, am I making a moral error in having a utility function that acts that way?
I would answer yes to the first question. As I understand it, Eliezer would answer yes to the second question and would answer no to the first, were he in my shoes. I would claim that Eliezer is making a moral error in both judgments.
Do you (in the years 2011, 2012, 2013, 2014) derive different relative utilities for these conditions? If so, it seems you have a problem.
I’m sorry. I don’t know what is meant by utility derived in 2014 from an event in 2012. I understand that the whole point of my assigning utilities in 2014 is to guide myself in making decisions in 2014. But no decision I make in 2014 can have an effect on events in 2012. So, from a decision-theoretic viewpoint, it doesn’t matter how I evaluate the utilities of past events. They are additive constants (same in all decision branches) in any computation of utility, and hence are irrelevant.
Or did you mean to ask about different relative utilities in the years before 2012? Yes, I understand that if I don’t use exponential discounting, then I risk inconsistencies.
And that is a fact about 2007 decision maker, not 2008 family’s value as compared to 2009 family.
If, in 2007, you present me with a choice of clean water for a family for all of and only 2008 vs 2009, and you further assure me that these families will otherwise survive in hardship, and that their suffering in one year won’t materially affect their next year, and that I won’t have this opportunity again come this time next year, and that flow-on or snowball effects which benefit from an early start are not a factor here—then I would be indifferent to the choice.
If I would not be; if there is something intrinsic about earlier times that makes them more valuable, and not just a heuristic of preferring them for snowballing or flow-on reasons, then that is what Eliezer is saying seems wrong.
I would classify that as instrumental discounting. I don’t think anyone would argue with that—except maybe a superintelligence who has already exhausted the whole game tree—and for whom an extra year buys nothing.
Given that you also believe that distributing your charitable giving over many charities is ‘risk management’, I suppose that should not surprise me.
FWIW, I genuinely don’t understand your perspective. The extent to which you discount the future depends on your chances of enjoying it—but also on factors like your ability to predict it—and your ability to influence it—the latter are functions of your abilities, of what you are trying to predict and of the current circumstances.
You really, really do not normally want to put those sorts of things into an agent’s utility function. You really, really do want to calculate them dynamically, depending on the agent’s current circumstances, prediction ability levels, actuator power levels, previous experience, etc.
Attempts to put that sort of thing into the utility function would normally tend to produce an inflexible agent, who has more difficulties in adapting and improving. Trying to incorporate all the dynamic learning needed to deal with the issue into the utility function might be possible in principle—but that represents a really bad idea.
Hopefully you can see my reasoning on this issue. I can’t see your reasoning, though. I can barely even imagine what it might possibly be.
Maybe you are thinking that all events have roughly the same level of unpredictability in the future, and there is roughly the same level of difficulty in influencing them, so the whole issue can be dealt with by one (or a small number of) temporal discounting “fudge factors”—and that evoution built us that way because it was too stupid to do any better.
You apparently denied that resource limitation results in temporal discounting. Maybe that is the problem (if so, see my other reply here). However, now you seem to have acknowledged that an extra year of time to worry in helps with developing plans. What I can see doesn’t seem to make very much sense.
I really, really am not advocating that we put instrumental considerations into our utility functions. The reason you think I am advocating this is that you have this fixed idea that the only justification for discounting is instrumental. So every time I offer a heuristic analogy explaining the motivation for fundamental discounting, you interpret it as a flawed argument for using discounting as a heuristic for instrumental reasons.
Since it appears that this will go on forever, and I don’t discount the future enough to make the sum of this projected infinite stream of disutility seem small, I really ought to give up. But somehow, my residual uncertainty about the future makes me think that you may eventually take Cromwell’s advice.
To clarify: I do not think the only justification for discounting is instrumental. My position is more like: agents can have whatever utility functions they like (including ones with temporal discounting) without having to justify them to anyone.
However, I do think there are some problems associated with temporal discounting. Temporal discounting sacrifices the future for the sake of the present. Sometimes the future can look after itself—but sacrificing the future is also something which can be taken too far.
Axelrod suggested that when the shadow of the future grows too short, more defections happen. If people don’t sufficiently value the future, reciprocal altruism breaks down. Things get especially bad when politicians fail to value the future. We should strive to arrange things so that the future doesn’t get discounted too much.
Instrumental temporal discounting doesn’t belong in ultimate utility functions. So, we should figure out what temporal discounting is instrumental and exclude it.
If we are building a potentially-immortal machine intelligence with a low chance of dying and which doesn’t age, those are more causes of temporal discounting which could be discarded as well.
What does that leave? Not very much, IMO. The machine will still have some finite chance of being hit by a large celestial body for a while. It might die—but its chances of dying vary over time; its degree of temporal discounting should vary in response—once again, you don’t wire this in, you let the agent figure it out dynamically.
The point is that resource limitation makes these estimates bad estimates—and you can’t do better by replacing them with better estimates because of … resource limitation!
To see how resource limitation leads to temporal discounting, consider computer chess. Powerful computers play reasonable games—but heavily resource limited ones fall for sacrifice plays, and fail to make successful sacrifice gambits. They often behave as though they are valuing short-term gain over long term results.
A peek under the hood quickly reveals why. They only bother looking at a tiny section of the game tree near to the current position! More powerful programs can afford to exhaustively search that space—and then move on to positions further out. Also the limited programs employ “cheap” evaluation functions that fail to fully compensate for their short-term foresight—since they must be able to be executed rapidly. The result is short-sighted chess programs.
That resource limitation leads to temporal discounting is a fairly simple and general principle which applies to all kinds of agents.
Why do you keep trying to argue against discounting using an example where discounting is inappropriate by definition? The objective in chess is to win. It doesn’t matter whether you win in 5 moves or 50 moves. There is no discounting. Looking at this example tells us nothing about whether we should discount future increments of utility in creating a utility function.
Instead, you need to look at questions like this: An agent plays go in a coffee shop. He has the choice of playing slowly, in which case the games each take an hour and he wins 70% of them. Or, he can play quickly, in which case the games each take 20 minutes, but he only wins 60% of them. As soon as one game finishes, another begins. The agent plans to keep playing go forever. He gains 1 util each time he wins and loses 1 util each time he loses.
The main decision he faces is whether he maximizes utility by playing slowly or quickly. Of course, he has infinite expected utility however he plays. You can redefine the objective to be maximizing utility flow per hour and still get a ‘rational’ solution. But this trick isn’t enough for the following extended problem:
The local professional offers go lessons. Lessons require a week of time away from the coffee-shop and a 50 util payment. But each week of lessons turns 1% of your losses into victories. Now the question is: Is it worth it to take lessons? How many weeks of lessons are optimal? The difficulty here is that we need to compare the values of a one-shot (50 utils plus a week not playing go) with the value of an eternal continuous flow (the extra fraction of games per hour which are victories rather than losses). But that is an infinite utility payoff from the lessons, and only a finite cost, right? Obviously, the right decision is to take a week of lessons. And then another week after that. And so on. Forever.
Discounting of future utility flows is the standard and obvious way of avoiding this kind of problem and paradox. But now let us see whether we can alter this example to capture your ‘instrumental discounting due to an uncertain future’:
First, the obvious one. Our hero expects to die someday, but doesn’t know when. He estimates a 5% chance of death every year. If he is lucky, he could live for another century. Or he could keel over tomorrow. And when he dies, the flow of utility from playing go ceases. It is very well known that this kind of uncertainty about the future is mathematically equivalent to discounted utility in a certain future. But you seemed to be suggesting something more like the following:
Our hero is no longer certain what his winning percentage will be in the future. He knows that he experiences microstrokes roughly every 6 months, and that each incident takes 5% of his wins and changes them to losses. On the other hand, he also knows that roughly every year he experiences a conceptual breakthrough. And that each such breakthrough takes 10% of his losses and turns them into victories.
Does this kind of uncertainty about the future justify discounting on ‘instrumental grounds’? My intuition says ’No, not in this case, but there are similar cases in which discounting would work.” I haven’t actually done the math, though, so I remain open to instruction.
Temporal discounting is about valuing something happening today more than the same thing happening tomorrow.
Chess computers do, in fact discount. That is why they do prefer to mate you in twenty moves rather than a hundred.
The values of a chess computer do not just tell it to win. In fact, they are complex—e.g. Deep Blue had an evaluation function that was split into 8,000 parts.
Operation consists of maximising the utility function, after foresight and tree pruning. Events that take place in branches after tree pruning has truncated them typically don’t get valued at all—since they are not forseen. Resource-limited chess computers can find themselves preferring to promote a pawn sooner rather than later. They do so since they fail to see the benefit of sequences leading to promotion later.
So: we apparently agree that resource limitation leads to indifference towards the future (due to not bothering to predict it) - but I classify this as a kind of temporal discounting (since rewards in the future get ignored), wheras you apparently don’t.
Hmm. It seems as though this has turned out to be a rather esoteric technical question about exactly which set of phenomena the term “temporal discounting” can be used to refer to.
Earlier we were talking about whether agents focussed their attention on tomorrow—rather than next year. Putting aside the issue of whether that is classified as being “temporal discounting”—or not—I think the extent to which agents focus on the near-future is partly a consequence of resource limitation. Give the agents greater abilities and more resources and they become more future-oriented.
No, I have not agreed to that. I disagree with almost every part of it.
In particular, I think that the question of whether (and how much) one cares about the future is completely prior to questions about deciding how to act so as to maximize the things one cares about. In fact, I thought you were emphatically making exactly this point on another branch.
But that is fundamental ‘indifference’ (which I thought we had agreed cannot flow from instrumental considerations). I suppose you must be talking about some kind of instrumental or ‘derived’ indifference. But I still disagree. One does not derive indifference from not bothering to predict—one instead derives not bothering to predict from being indifferent.
Furthermore, I don’t respond to expected computronium shortages by truncating my computations. Instead, I switch to an algorithm which produces less accurate computations at lower computronium costs.
And finally, regarding classification, you seem to suggest that you view truncation of the future as just one form of discounting, whereas I choose not to. And that this makes our disagreement a quibble over semantics. To which I can only reply: Please go away Tim.
I think you would reduce how far you look forward if you were interested in using your resources intelligently and efficiently.
If you only have a million cycles per second, you can’t realistically go 150 ply deep into your go game—no matter how much you care about the results after 150 moves. You compromise—limiting both depth and breadth. The reduction in depth inevitably means that you don’t look so far into the future.
A lot of our communication difficulty arises from using different models to guide our intuitions. You keep imagining game-tree evaluation in a game with perfect information (like chess or go). Yes, I understand your point that in this kind of problem, resource shortages are the only cause of uncertainty—that given infinite resources, there is no uncertainty.
I keep imagining problems in which probability is built in, like the coffee-shop-go-player which I sketched recently. In the basic problem, there is no difficulty in computing expected utilities deeper into the future—you solve analytically and then plug in whatever value for t that you want. Even in the more difficult case (with the microstrokes) you can probably come up with an analytic solution. My models just don’t have the property that uncertainty about the future arises from difficulty of computation.
Right. The real world surely contains problems of both sorts. If you have a problem which is dominated by chaos based on quantum events then more resources won’t help. Whereas with many other types of problems more resources do help.
I recognise the existence of problems where more resources don’t help—I figure you probably recognise that there are problems where more resources do help—e.g. the ones we want intelligent machines to help us with.
Perhaps the real world does. But decision theory doesn’t. The conventional assumption is that a rational agent is logically omniscient. And generalizing decision theory by relaxing that assumption looks like it will be a very difficult problem.
The most charitable interpretation I can make of your argument here is that human agents, being resource limited, imagine that they discount the future. That discounting is a heuristic introduced by evolution to compensate for those resource limitations. I also charitably assume that you are under the misapprehension that if I only understood the argument, I would agree with it. Because if you really realized that I have already heard you, you would stop repeating yourself.
That you will begin listening to my claim that not all discounting is instrumental is more than I can hope for, since you seem to think that my claim is refuted each time you provide an example of what you imagine to be a kind of discounting that can be interpreted as instrumental.
I repeat, Tim. Please go elsewhere.
I am pretty sure that I just told you that I do not think that all discounting is instrumental. Here’s what I said:
Agents can have many kinds of utility function! That is partly a consequence of there being so many different ways for agents to go wrong.
Thx for the correction. It appears I need to strengthen my claim.
Not all discounting by rational, moral agents is instrumental.
Are we back in disagreement now? :)
No, we aren’t. In my book:
Being rational isn’t about your values, you can rationally pursue practially any goal. Epistemic rationality is a bit different—but I mosly ignore that as being unbiological.
Being moral isn’t really much of a constraint at all. Morality—and right and wrong—are normally with respect to a moral system—and unless a moral system is clearly specified, you can often argue all day about what is moral and what isn’t. Maybe some types of morality are more common than others—due to being favoured by the universe, or something like that—but any such context would need to be made plain in the discussion.
So, it seems (relatively) easy to make a temporal discounting agent that really values the present over the future—just stick a term for that in its ultimate values.
Are there any animals with ultimate temporal discounting? That is tricky, but it isn’t difficult to imagine natural selection hacking together animals that way. So: probably, yes.
Do I use ultimate temporal discounting? Not noticably—as far as I can tell. I care about the present more than the future, but my temporal discounting all looks instrumental to me. I don’t go in much for thinking about saving distant galaxies, though! I hope that further clarifies.
I should probably review around about now. Instead of that: IIRC, you want to wire temporal discounting into machines, so their preferences better match your own—whereas I tend to think that would be giving them your own nasty hangover.
If you are not valuing my responses, I recommend you stop replying to them—thereby ending the discussion.
Programs make good models. If you can program it, you have a model of it. We can actually program agents that make resource-limited decisions. Having an actual program that makes decisions is a pretty good way of modeling making resource-limited decisions.
Perhaps we have some kind of underlying disagreement about what it means for temporal discounting to be “instrumental”.
In your example of an agent with suffering from risk of death, my thinking is: this player might opt for a safer life—with reduced risk. Or they might choose to lead a more interesting but more risky life. Their degree of discounting may well adjust itself accordingly—and if so, I would take that as evidence that their discounting was not really part of their pure preferences, but rather was an instrumental and dynamic response to the observed risk of dying.
If—on the other hand—they adjusted the risk level of their lifestyle, and their level of temporal discounting remained unchanged, that would be cofirming evidence in favour of the hypothesis that their temporal discounting was an innate part of their ultimate preferences—and not instrumental.
This bothers me since, with reasonable assumptions, all rational agents engage in the same amount of catastrophe discounting.
That is, observed discount rate = instrumental discount rate + chance of death + other factors
We should expect everyone’s discount rate to change, by the same amount, unless they’re irrational.
Agents do not all face the same risks, though.
Sure, they may discount the same amount if they do face the same risks, but often they don’t—e.g. compare the motorcycle racer with the nun.
So: the discounting rate is not fixed at so-much per year, but rather is a function of the agent’s observed state and capabilities.
Of course. My point is that observing if the discount rate changes with the risk tells you if the agent is rational or irrational, not if the discount rate is all instrumental or partially terminal.
Stepping back for a moment, terminal values represent what the agent really wants, and instrumental values are things sought en-route.
The idea I was trying to express was: if what an agent really wants is not temporally discounted, then instrumental temporal discounting will produce a predictable temporal discounting curve—caused by aging, mortality risk, uncertainty, etc.
Deviations from that curve would indicate the presence of terminal temporal discounting.
Agreed.
I have no disagreement at all with your analysis here. This is not fundamental discounting. And if you have decision alternatives which affect the chances of dying, then it doesn’t even work to model it as if it were fundamental.
You recently mentioned the possibility of dying in the interim. There’s also the possibility of aging in the interim. Such factors can affect utility calculations.
For example: I would much rather have my grandmother’s inheritance now than years down the line, when she finally falls over one last time—because I am younger and fitter now.
Significant temporal discounting makes sense sometimes—for example, if there is a substantial chance of extinction per unit time. I do think a lot of discounting is instrumental, though—rather than being a reflection of ultimate values—due to things like the future being expensive to predict and hard to influence.
My brain spends more time thinking about tomorrow than about this time next year—because I am more confident about what is going on tomorrow, and am better placed to influence it by developing cached actions, etc. Next year will be important too—but there will be a day before to allow me to prepare for it closer to the time, when I am better placed to do so. The difference is not because I will be older then—or because I might die in the mean time. It is due to instrumental factors.
Of course one reason this is of interest is because we want to know what values to program into a superintelligence. That superintelligence will probably not age—and will stand a relatively low chance of extinction per unit time. I figure its ultimate utility function should have very little temporal discounting.
The problem with wiring discount functions into the agent’s ultimate utility function is that that is what you want it to preserve as it self improves. Much discounting is actually due to resource limitation issues. It makes sense for such discounting to be dynamically reduced as more resources become cheaply available. It doesn’t make much sense to wire-in short-sightedness.
I don’t mind tree-pruning algorithms attempting to normalise partial evaluations at different times—so they are more directly comparable to each other. The process should not get too expensive, though—the point of tree pruning is that it is an economy measure.
I suspect you want to replace “feel like they have given” with “give.”
Unless you are actually claiming that what is immoral is to make people fail to feel consulted, rather than to fail to consult them, which doesn’t sound like what you’re saying.
I think I will go with a simple tense change: “feel that they are giving”. Assent is far more important in the lead-up to the Singularity than during the aftermath.
Although I used the language “morally wrong”, my reason for that was mostly to make the rhetorical construction parallel. My preference for an open, inclusive process is a strong preference, but it is really more political/practical than moral/idealistic. One ought to allow the horses to approach the trough of political participation, if only to avoid being trampled, but one is not morally required to teach them how to drink.
Ah, I see. Sure, if you don’t mean morally wrong but rather politically impractical, then I withdraw my suggestion… I entirely misunderstood your point.
No, I did originally say (and mostly mean) “morally” rather than “politically”. And I should thank you for inducing me to climb down from that high horse.
I submit that I have many of the same misconceptions that Eliezer does; he changed his mind about one of the few places I disagree with him. That makes it far more of a change than it would be for you (one out of eight is a small portion, one out of a thousand is an invisible fraction).
Good point. And since ‘scary’ is very much a subjective judgment, that mean that I can’t validly criticize you for being foolish unless I have some way of arguing that yours and Eliezer’s positions in the realm of meta-ethics are misconceptions—something I don’t claim to be able to do.
So, if I wish my criticisms to be objective, I need to modify them. Eliezer’s expressed positions on meta-ethics (particularly his apparent acceptance of act-utilitarianism and his unwillingness to discount future utilities) together with some of his beliefs regarding the future (particularly his belief in the likelihood of a positive singularity and expansion of human population into the universe) make his ethical judgments completely unpredictable to many other people—unpredictable because the judgment may turn on subtle differences in the expect consequences of present day actions on people in the distant future. And, if one considers the moral judgments of another personal to be unpredictable, and that person is powerful, then one ought to consider that person scary. Eliezer is probably scary to many people.
True, but it has little bearing on whether Eliezer should be scary. That is, “Eliezer is scary to many people” is mostly a fact about many people, and mostly not a fact about Eliezer. The reverse of this (and what I base this distinction on) is that some politicians should be scary, and are not scary to many people.
I’m not sure the proposed modification helps: you seem to have expanded your criticism so far, in order to have them lead to the judgment you want to reach, that they cover too much.
I mean, sure, unpredictability is scarier (for a given level of power) than predictability. Agreed, But so what?
For example, my judgments will always be more unpredictable to people much stupider than I am than to people about as smart or smarter than I am. So the smarter I am, the scarier I am (again, given fixed power)… or, rather, the more people I am scary to… as long as I’m not actively devoting effort to alleviating those fears by, for example, publicly conforming to current fashions of thought. Agreed.
But what follows from that? That I should be less smart? That I should conform more? That I actually represent a danger to more people? I can’t see why I should believe any of those things.
You started out talking about what makes one dangerous; you have ended up talking about what makes people scared of one whether one is dangerous or not. They aren’t equivalent.
Well, I hope I haven’t done that.
Well, I certainly did that. I was trying to address the question more objectively, but it seems I failed. Let me try again from a more subjective, personal position.
If you and I share the same consequentialist values, but I know that you are more intelligent, I may well consider you unpredictable, but I won’t consider you dangerous. I will be confident that your judgments, in pursuit of our shared values, will be at least as good as my own. Your actions may surprise me, but I will usually be pleasantly surprised.
If you and I are of the same intelligence, but we have different consequentialist values (both being egoists, with disjoint egos, for example) then we can expect to disagree on many actions. Expecting the disagreement, we can defend ourselves, or even bargain our way to a Nash bargaining solution in which (to the extent that we can enforce our bargain) we can predict each others behavior to be that promoting compromise consequences.
If, in addition to different values, we also have different beliefs, then bargaining is still possible, though we cannot expect to reach a Pareto optimal bargain. But the more our beliefs diverge, regarding consequences that concern us, the less good our bargains can be. In the limit, when the things that matter to us are particularly difficult to predict, and when we each have no idea what the other agent is predicting, bargaining simply becomes ineffective.
Eliezer has expressed his acceptance of the moral significance of the utility functions of people in the far distant future. Since he believes that those people outnumber us folk in the present, that seems to suggest that he would be willing to sacrifice the current utility of us in favor of the future utility of them. (For example, the positive value of saving a starving child today does not outweigh the negative consequences on the multitudes of the future of delaying the Singularity by one day).
I, on the other hand, systematically discount the future. That, by itself, does not make Eliezer dangerous to me. We could strike a Nash bargain, after all. However, we inevitably also have different beliefs about consequences, and the divergence between our beliefs becomes greater the farther into the future we look. And consequences in the distant future are essentially all that matters to people like Eliezer—the present fades into insignificance by contrast. But, to people like me, the present and near future are essentially all that matter—the distant future discounts into insignificance.
So, Eliezer and I care about different things. Eliezer has some ability to predict my actions because he knows I care about short-term consequences and he knows something about how I predict short-term consequences. But I have little ability to predict Eliezer’s actions, because I know he cares primarily about long term consequences, and they are inherently much more unpredictable. I really have very little justification for modeling Eliezer (and any other act utilitarian who refuses to discount the future) as a rational agent.
I wish you would just pretend that they care about things a million times further into the future than you do.
The reason is that there are instrumental reasons to discount—the future disappears into a fog of uncertainty—and you can’t make decisions based on the value of things you can’t forsee.
The instrumental reasons fairly quickly dominate as you look further out—even when you don’t discount in your values. Reading your post, it seems as though you don’t “get” this, or don’t agree with it—or something.
Yes, the far-future is unpredictable—but in decision theory, that tends to make it a uniform grey—not an unpredictable black and white strobing pattern.
I don’t need to pretend. Modulo some mathematical details, it is the simple truth. And I don’t think there is anything irrational about having such preferences. It is just that, since I cannot tell whether or not what I do will make such people happy, I have no motive to pay any attention to their preferences.
Yet, it seems that the people who care about the future do not agree with you on that. Bostrom, Yudkowsky, Nesov, et al. frequently invoke assessments of far-future consequences (sometimes in distant galaxies) in justifying their recommendations.
We have crossed wires here. What I meant is that I wish you would stop protesting about infinite utilities—and how non-discounters are not really even rational agents—and just model them as ordinary agents who discount a lot less than you do.
Objections about infinity strike me as irrelevant and uninteresting.
Is that your true objection? I expect you can figure out what would make these people happy fairly easily enough most of the time—e.g. by asking them.
Indeed. That is partly poetry, though (big numbers make things seem important) - and partly because they think that the far future will be highly contingent on near future events.
The thing they are actually interested in influencing is mostly only a decade or so out. It does seem quite important—significant enough to reach back to us here anyway.
If what you are trying to understand is far enough away to be difficult to predict, and very important, then that might cause some oscillations. That is hardly a common situation, though.
Most of the time, organisms act as though want to become ancestors. To do that, the best thing they can do is focus on having some grandkids. Expanding their circle of care out a few generations usually makes precious little difference to their actions. The far future is unforseen, and usually can’t be directly influenced. It is usually not too relevant. Usually, you leave it to your kids to deal with.
That is a valid point. So, I am justified in treating them as rational agents to the extent that I can engage in trade with them. I just can’t enter into a long-term Nash bargain with them in which we jointly pledge to maximize some linear combination of our two utility functions in an unsupervised fashion. They can’t trust me to do what they want, and I can’t trust them to judge their own utility as bounded.
I think this is back to the point about infinities. The one I wish you would stop bringing up—and instead treat these folk as though they are discounting only a teeny, tiny bit.
Frankly, I generally find it hard to take these utilitarian types seriously in the first place. A “signalling” theory (holier-than-thou) explains the unusually high prevalance of utilitarianism among moral philosophers—and an “exploitation” theory explains its prevalance among those running charitable causes (utilitarianism-says-give-us-your-money). Those explanations do a good job of modelling the facts about utilitarianism—and are normally a lot more credible than the supplied justifications—IMHO.
Which suggests that we are failing to communicate. I am not surprised.
I do that! And I still discover that their utility functions are dominated by huge positive and negative utilities in the distant future, while mine are dominated by modest positive and negative utilities in the near future. They are still wrong even if they fudge it so that their math works.
I went from your “I can’t trust them to judge their own utility as bounded” to your earlier “infinity” point. Possibly I am not trying very hard here, though...
My main issue was you apparently thinking that you couldn’t predict their desires in order to find mutually beneficial trades. I’m not really sure if this business about not being able to agree to maximise some shared function is a big deal for you.
Mm. OK, so you are talking about scaring sufficiently intelligent rationalists, not scaring the general public. Fair enough.
What you say makes sense as far as it goes, assuming some mechanism for reliable judgments about people’s actual bases for their decisions. (For example, believing their self-reports.)
But it seems the question that should concern you is not whether Eliezer bases his decisions on predictable things, but rather whether Eliezer’s decisions are themselves predictable.
Put a different way: by your own account, the actual long-term consequences don’t correlate reliably with Eliezer’s expectations about them… that’s what it means for those consequences to be inherently unpredictable. And his decisions are based on his expectations, of course, not on the actual future consequences. So it seems to follow that once you know Eliezer’s beliefs about the future, whether those beliefs are right or wrong is irrelevant to you: that just affects what actually happens in the future, which you systematically discount anyway.
So if Eliezer is consistent in his beliefs about the future, and his decisions are consistently grounded in those beliefs, I’m not sure what makes him any less predictable to me than you are.
Of course, his expectations might not be consistent. Or they might be consistent but beyond your ability to predict. Or his decisions might be more arbitrary than you suggest here. For that matter, he might be lying outright. I’m not saying you should necessarily trust him, or anyone else.
But those same concerns apply to everybody, whatever their professed value structure. I would say the same things about myself.
But Eliezer’s beliefs about the future continue to change—as he gains new information and completes new deductions. And there is no way that he can practically keep me informed of his beliefs—neither he nor I would be willing to invest the time required for that communication. But Eliezer’s beliefs about the future impact his actions in the present, and those actions have consequences both in the near and distant future. From my point of view, therefore, his actions have essentially random effects on the only thing that matters to me—the near future.
Absolutely. But who isn’t that true of? At least Eliezer has extensively documented his putative beliefs at various points in time, which gives you some data points to extrapolate from.
I have no complaints regarding the amount of information about Eliezer’s beliefs that I have access to. My complaint is that Eliezer, and his fellow non-discounting act utilitarians, are morally driven by the huge differences in utility which they see as arising from events in the distant future—events which I consider morally irrelevant because I discount the future. No realistic amount of information about beliefs can alleviate this problem. The only fix is for them to start discounting. (I would have added “or for me to stop discounting” except that I still don’t know how to handle the infinities.)
Given that they predominantly care about things I don’t care about, and that I predominantly care about things they don’t worry about, we can only consider each other to be moral monsters.
You and I seem to be talking past each other now. It may be time to shut this conversation down.
Ethical egoists are surely used to this situation, though. The world is full of people who care about extremely different things from one another.
Yes. And if they both mostly care about modest-sized predictable things, then they can do some rational bargaining. Trouble arises when one or more of them has exquisitely fragile values—when they believe that switching a donation from one charity to another destroys galaxies.
I expect your decision algorithm will find a way to deal with people who won’t negotiate on some topics—or who behave in manner you have a hard time predicting. Some trouble for you, maybe—but probably not THE END OF THE WORLD.
Looking at the last 10 years, there seems to be some highly-predictable fund raising activity, and a lot of philosophising about the importance of machine morality.
I see some significant patterns there. It is not remotely like a stream of random events. So: what gives?
Sure, the question of whether a superintelligence will construct a superior morality to that which natural selection and cultural evolution have constructed on Earth is in some sense a narrow technical question. (The related question of whether the phrase “superior morality” even means anything is, also.)
But it’s a technical question that pertains pretty directly to the question of whose side one envisions oneself on.
That is, if one answers “yes,” it can make sense to ally with the Singularity rather than humanity (assuming that even means anything) as EY-1998 claims to, and still expect some unspecified good (or perhaps Good) result. Whereas if one answers “no,” or if one rejects the very idea that there’s such a thing as a superior morality, that justification for alliance goes away.
That said, I basically agree with you, though perhaps for different reasons than yours.
That is, even after embracing the idea that no other values, even those held by a superintelligence, can be superior to human values, one is still left with the same choice of alliances. Instead of “side with humanity vs. the Singularity,” the question involves a much narrower subset: “side with humanity vs. FAI-induced Singularity,” but from our perspective it’s a choice among infinities.
Of course, advocates of FAI-induced Singularity will find themselves saying that there is no conflict, really, because an FAI-induced Singularity will express by definition what’s actually important about humanity. (Though, of course, there’s no guarantee that individual humans won’t all be completely horrified by the prospect.)
Recall that after CEV extrapolates current humans’ volitions and construes a coherent superposition, the next step isn’t “do everything that superposition says”, but rather, “ask that superposition the one question ‘Given the world as it is right now, what program should we run next?’, run that program, and then shut down”. I suppose it’s possible that our CEV will produce an AI that immediately does something we find horrifying, but I think our future selves are nicer than that… or could be nicer than that, if extrapolated the right way, so I’d consider it a failure of Friendliness if we get a “do something we’d currently find horrifying for the greater good” AI if a different extrapolation strategy would have resulted in something like a “start with the most agreeable and urgent stuff, and other than that, protect us while we grow up and give us help where we need it” AI.
I really doubt that we’d need an AI to do anything immediately horrifying to the human species in order to allow it to grow up into an awesome fun posthuman civilization, so if CEV 1.0 Beta 1 appeared to be going in that direction, that would probably be considered a bug and fixed.
(shrug) Sure, if you’re right that the “most urgent and agreeable stuff” doesn’t happen to press a significant number of people’s emotional buttons, then it follows that not many people’s emotional buttons will be pressed.
But there’s a big difference between assuming that this will be the case, and considering it a bug if it isn’t.
Either I trust the process we build more than I trust my personal judgments, or I don’t.
If I don’t, then why go through this whole rigamarole in the first place? I should prefer to implement my personal judgments. (Of course, I may not have the power to do so, and prefer to join more powerful coalitions whose judgments are close-enough to mine. But in that case CEV becomes a mere political compromise among the powerful.)
If I do, then it’s not clear to me that “fixing the bug” is a good idea.
That is, OK, suppose we write a seed AI intended to work out humanity’s collective CEV, work out some next-step goals based on that CEV and an understanding of likely consequences, construct a program P to implement those goals, run P, and quit.
Suppose that I am personally horrified by the results of running P. Ought I choose to abort P? Or ought I say to myself “Oh, how interesting: my near-mode emotional reactions to the implications of what humanity really wants are extremely negative. Still, most everybody else seems OK with it. OK, fine: this is not going to be a pleasant transition period for me, but my best guess is still that it will ultimately be for the best.”
Is there some number of people such that if more than that many people are horrified by the results, we ought to choose to abort P?
Does the question even matter? The process as you’ve described it doesn’t include an abort mechanism; whichever choice we make P is executed.
Ought we include such an abort mechanism? It’s not at all clear to me that we should. I can get on a roller-coaster or choose not to get on it, but giving me a brake pedal on a roller coaster is kind of ridiculous.
It’s partly a chance vs necessity question.
It is partly a question about whether technological determinism is widespread.
Apparently he changed his mind about a bunch of things.
On what appears to be their current plan, the SIAI, don’t currently look very dangerous, IMHO.
Eray Ozkural recently complained: “I am also worried that backwards people and extremists will threaten us, and try to dissuade us from accomplishing our work, due to your scare tactics.”
I suppose that sort of thing is possible—but my guess is that they are mostly harmless.
Or so you hope.
Yes, I agree. I don’t really believe that he only learnt how to disguise his true goals. But I’m curious if you would be satisfied with his word alone if he would be able to run a fooming AI next week only if you gave your OK?
He has; this is made abundantly clear in the Metaethics sequence and particularly the “coming of age” sequence. That passage appears to be a reflection of the big embarrassing mistake he talked about, when he thought that he knew nothing about true morality (se “Could Anything Be Right?”) and that a superintelligence with a sufficiently “unconstrained” goal system (or what he’d currently refer to as “a rock”) would necessarily discover the ultimate true morality, so that whatever this superintelligence ended up doing would necessarily be the right thing, whether that turned out to consist of giving everyone a volcano lair full of catgirls/boys or wiping out humanity and reshaping the galaxy for its own purposes.
Needless to say, that is not his view anymore; there isn’t even any “Us or Them” to speak of anymore. Friendly AIs aren’t (necessarily) people, and certainly won’t be a distinct race of people with their own goals and ambitions.
Yes, I’m not suggesting that he is just signaling all that he wrote in the sequences to persuade people to trust him. I’m just saying that when you consider what people are doing for much less than shaping the whole universe to their liking, one might consider some sort of public or third-party examination before anyone is allowed to launch a fooming AI.
The hard part there is determining who’s qualified to perform that examination.
It will probably never come to it anyway. Not because the SIAI is not going to succeed but if it told anyone that it is even close to implementing something like CEV then the whole might of the world would crush it (if the world didn’t turn rational until then). Because to say that you are going to run a fooming AI will be interpreted as trying to take over all power and rule the universe. I suppose this is also the most likely reason for the SIAI to fail. The idea is out and once people notice that fooming AI isn’t just science fiction they will do everything to stop anyone from either implementing one at all or to run their own before anyone else does. And who’ll be the first competitor to take out in the race to take over the universe? The SIAI of course, just search Google. I guess it would have been a better idea to make this a stealth project from day one. But that train has left.
Anyway, if the SIAI does succeed one can only hope that Yudkowsky is not Dr. Evil in disguise. But even that would still be better than a paperclip maximizer. I assign more utility to a universe adjusted to Yudkowsky’s volition (or the SIAI) than paperclips (I suppose even if that means I’ll not “like” what happens to me then).
I don’t see who is going to enforce that. Probably nobody.
What we are fairly likely to see is open-source projects getting more limelight. It is hard to gather mindshare if your strategy is: trust the code to us. Relatively few programmers are likely to buy into such projects—unless you pay them to do so.
Yes on the question of humans vs Singularity.
(His word alone would not be enough to convince me he’s gotten the fooming AI friendly, though, so I would not give the OK for prudential reasons.)
So you take him at his word that he’s working in your best interest. You don’t think it is necessary to supervise the SIAI while working towards friendly AI. But once they finished their work, ready to go, you are in favor of some sort of examination before they can implement it. Is that correct?
I don’t think human selfishness vs. public interest is much of a problem with FAI; everyone’s interests with respect to FAI are well correlated, and making an FAI which specifically favors its creator doesn’t give enough extra benefit over an FAI which treats everyone equally to justify the risks (that the extra term will be discovered, or that the extra term introduces a bug). Not even for a purely selfish creator; FAI scenarios just doesn’t leave enough room for improvement to motivate implementing something else.
On the matter of inspecting AIs before launch, however, I’m conflicted. On one hand, the risk of bugs is very serious, and the only way to mitigate it is to have lots of qualified people look at it closely. On the other hand, if the knowledge that a powerful AI was close to completion became public, it would be subject to meddling by various entities that don’t understand what they’re doing. and it would also become a major target for espionage by groups of questionable motives and sanity who might create UFAIs. These risks are difficult to balance, but I think secrecy is the safer choice, and should be the default.
If your first paragraph turns out to be true, does that change anything with respect to the problem of human and political irrationality? My worry is that even if there is only one rational solution that everyone should favor, how likely is it that people understand and accept this? That might be no problem given the current perception. If the possibility of fooming AI will still be ignored at the point it will be possible to implement friendliness (CEV etc.), then there will be no opposition. So some quick quantum leaps towards AGI will likely allow the SIAI to follow through on it. But my worry is that if the general public or governments notice this possibility and take it serious, it will turn into a political mess never seen before. The world would have to be dramatically different for the big powers to agree on something like CEV. I still think this is the most likely failure mode in case the SIAI succeeds in defining friendliness before someone else runs a fooming AI. Politics.
I agree. But is that still possible? After all we’re writing about it in public. Although to my knowledge the SIAI never suggested that it would actually create a fooming AI, only come up with a way to guarantee its friendliness. But what you said in your second paragraph would suggest that the SIAI would also have to implement friendliness or otherwise people will take advantage of it or simply mess it up.
This?
http://www.acceleratingfuture.com/people-blog/?p=196
Probably it would be easier to run the examination during the SIAI’s work, rather than after. Certainly it would save more lives. So, supervise them, so that your examination is faster and more thorough. I am not in favour of pausing the project, once complete, to examine it if it’s possible to examine in in operation.
At the bottom—just after where he talks about his “transfer of allegiance”—it says:
©1998 by Eliezer S. Yudkowsky.
We can’t say he didn’t warn us ;-)
IMO, it is somewhat reminiscent of certain early Zuckerberg comments.
Eliezer1998 is almost as scary as Hanson2010 - and for similar reasons.
1998 you mean?
Yes. :)
What Zuckerberg comments are you referring to?
The IM ones where he says “trust me”.
Zuckerberg probably thought they were private, though. I added a link.
If you follow the link:
You shouldn’t seek to “weaken an argument”, you should seek what is the actual truth, and then maybe ways of communicating your understanding. (I believe that’s what you intended anyway, but think it’s better not to say it this way, as a protective measure against motivated cognition.)
I like your parenthetical, I often want to say something like this, and you’ve put it well.