The problem with Pascal’s Wager isn’t that it’s a Wager. The problem with Pascal’s Wager and Pascal’s Mugging (its analogue in finite expected utility maximization), as near as I can tell, is that if you do an expected utility calculation including one outcome that has a tiny probability but enough utility or disutility to weigh heavily in the calculation anyway, you need to include every possible outcome that is around that level of improbability, or you are privileging a hypothesis and are probably making the calculation less accurate in the process. If you actually are including every other hypothesis at that level of improbability, for instance if you are a galaxy-sized Bayesian superintelligence who, for reasons beyond my mortal mind’s comprehension, has decided not to just dismiss those tiny possibilities a priori anyway, then it still shouldn’t be any problem; at that point, you should get a sane, nearly-optimal answer.
So, is this situation a Pascal’s Mugging? I don’t think it is. 1% isn’t at the same level of ridiculous improbability as, say, Yahweh existing, or the mugger’s threat being true. 1% chances actually happen pretty often, so it’s both possible and prudent to take them into account when a lot is at stake. The only extra thing to consider is that the remaining 99% should be broken down into smaller possibilities; saying “1% humanity ends, 99% everything goes fine” is unjustified. There are probably some other possible outcomes that are also around 1%, and perhaps a bit lower, and they should be taken into account individually.
Excellent analysis. In fairness to Pascal, I think his available evidence at the time should have lead him to attribute more than a 1% chance to the Christian Bible being true.
Is that true? If we went back in time to before Darwin and gave a not-already-religious person (if we could find one) a thorough rationality lesson — enough to skillfully weigh the probabilities of competing hypotheses (including enough about cognitive science to know why intelligence and intentionality are not black boxes, must carry serious complexity penalties, and need to make specific advance predictions instead of just being invoked as “God wills it” retroactively about only the things that do happen), but not quite enough that they’d end up just inventing the theory of evolution themselves — wouldn’t they conclude, even in the absence of any specific alternatives, that design was a non-explanation, a mysterious answer to a mysterious question? And even imagining that we managed to come up with a technical model of an intelligent designer, specifying in advance the structure of its mind and its goal system, could it actually compress the pre-Darwin knowledge about the natural world more than slightly?
Dawkins actually brings this up in The Blind Watchmaker (page 6 in my copy). Hume is given as the example of someone who said “I don’t have an answer” before Darwin, and Dawkins describes it as such:
An atheist before Darwin could have said, following Hume: ‘I have no explanation for complex biological design. All I know is that God isn’t a good explanation, so we must wait and hope that somebody comes up with a better one.’ I can’t help feeling that such a position, though logically sound, would have left one feeling pretty unsatisfied, and that although atheism might have been logically tenable before Darwin, Darwin made it possible to be an intellectually fulfilled atheist. I like to think that Hume would agree, but some of his writings suggest that he underestimated the complexity and beauty of biological design.”
Hume’s Dialogues Concerning Natural Religion are definitely worth a read. And I think that Dawkins has it right: Hume really wanted a naturalistic explanation of apparent design in nature, and expected that such an explanation might be possible (even to the point of offering some tentative speculations), but he was honest enough to admit that he didn’t have an explanation at hand.
I didn’t really mean because of Darwin. Design is not a competitor to the theory of evolution. Evolution explains how complexity can increase. Design [ADDED: as an alternative to evolution] does not; it requires a designer that is assumed to be more complicated than the things it designs. Design explains nothing.
Evolution explains how complexity can increase. Design does not; it requires a designer that is assumed to be more complicated than the things it designs. Design explains nothing.
Designers can well design things more complicated than they are. (If even evolution without a mind can do so, designers do that easily.)
Agree. One way to look at it is that a designer can take a large source of complexity (whatever its brain is running on) and reshape and concentrate it into an area that is important to it. The complexity of the designer itself isn’t important. Evolution does much the same thing.
The main point of that post is clearly correct, but I think the example of corporations is seriously flawed. It fails to appreciate the extent to which successful business practices consists of informal, non-systematic practical wisdom accumulated through long tradition and selected by success and failure in the market, not conscious a priori planning. The transfer of these practices is clearly very different from DNA-based biological inheritance, but it still operates in such ways that a quasi-Darwinian process can take place.
Applying similar analysis to modern science would be a fascinating project. In my opinion, a lot of the present problems with the proliferation of junk science stem not from intentional malice and fraud, but from a similar quasi-Darwinian process fueled by the fact that practices that best contribute to one’s career success overlap only partly with those that produce valid science. (And as in the case of corporations, the transfer of these practices is very different from biological inheritance, but still permits quasi-Darwinian selection for effective practices.)
The main point of that post is clearly correct [...]
The post is a denial of cultural evolution. For the correct perspective, see: Not By Genes Alone: How Culture Transformed Human Evolution by Peter J. Richerson and Robert Boyd.
I’d like to inquire about the difference between evolution and design regarding the creation of novelty. I don’t see how any intelligence can come up with something novel that would allow it to increase complexity if not by the process of evolution.
If your definition of complexity says noise is complexity, then you need a new definition of complexity.
Yes, many useful definitions, like entropy measures or Kolmogorov complexity, say noise is complexity. But people studying complexity recognize that this is a problem. They are aware that the phenomenon they’re trying to get at when they say “complexity” is something different.
Well, I’m just trying to figure out what you tried to say when you replied to PhilGoetz:
Designers can well design things more complicated than they are.
Yes, but not without evolution. All that design adds to evolution is guidance. That is, if you took away evolution (this includes science and Bayesian methods) a designer could never design things more complicated (as in novel, as in better) than itself.
N designers, each of complexity K, can collectively design something of maximum complexity NK, simply by dividing up the work.
Co-evolution, which may be thought of as a pair of designers interacting through their joint design product, and with an unlimited random stream as supplementary input, can result in very complex designs as well as in the designers themselves becoming more complex through information acquired in the course of the interaction.
It is amusing to look at the Roman Catholic theology of the Trinity, with this kind of consideration in mind. As I remember it, the Deity was “originally” a unipartite, simple God, who then became more complex by contemplating Himself and then further contemplating that Contemplation.
For this reason, I have never been all that impressed by the “refutation” of the first cause argument; the refutation being that it supposedly requires a complex “first cause” God, Who is Himself in need of explanation. God could conceivably have been simple (as simple as a Big Bang, anyways) and then developed (some people would prefer to say “evolved”) under His own internal dynamics into something much more complex. Just as we atheists claim happened to the physical universe.
For this reason, I have never been all that impressed by the “refutation” of the first cause argument; the refutation being that it supposedly requires a complex “first cause” God, Who is Himself in need of explanation. God could conceivably have been simple (as simple as a Big Bang, anyways) and then developed (some people would prefer to say “evolved”) under His own internal dynamics into something much more complex. Just as we atheists claim happened to the physical universe.
Adapted refutation: if you’re going to suppose a complex God evolving from a simpler one and then acting on the universe, it is simpler to suppose a complex universe evolving from a simple one. The refutation still holds based on Occam’s razor.
For this reason, I have never been all that impressed by the “refutation” of the first cause argument; the refutation being that it supposedly requires a complex “first cause” God, Who is Himself in need of explanation. God could conceivably have been simple (as simple as a Big Bang, anyways) and then developed (some people would prefer to say “evolved”) under His own internal dynamics into something much more complex. Just as we atheists claim happened to the physical universe.
That simple “God” is the “God” of evolutionary theory. The “first mover” theory does require a complex first cause. It was made in ignorance of evolution, and assumes that a complex design requires an intelligent designer. Every last one of the defenders of the design theory denies that what you say is possible.
I was thinking in terms of Kolmogorov complexity. A Turing program generates an output string of complexity no greater than the size K of the program. Collectively, N different such Turing programs (plus a little glue logic) can generate a string of complexity NK.
If you have observations, that is source of randomness, you can generate output of arbitrary complexity.
Now, let’s step back and look at the whole picture. We were discussing a notion of ‘complexity’ such that evolved organisms gradually became more ‘complex’, and ‘designers’ which are themselves agents, possibly even evolved organisms, that can ‘design’ new things. We then consider that notion of ‘complexity’ as applied to ‘designers’ and ‘designs’ they can produce.
When informal notions are formalized, these formalizations should at least approximately relate to the original informal notions, otherwise we are changing the topic by bringing up these ‘formalizations’ and not actually making progress on understanding the original informal question.
K-complexity is something possessed by random noise. This notion does not reflect the measure of things by which evolution produced more ‘complex’ things than existed before (even if the ‘things’ produced by evolution are more K-complex than their early predecessors). And designers typically have access to randomness, which makes your model of ‘designers’ as programs without input wrong as well, hence conclusion about K-complexity of output incorrect, on top of K-complexity not adequately modeling the informal ‘complexity’.
All very true. Which is one reason I dislike all talk of “complexity”—particularly in such a fuzzy context as debates with creationists.
But we do all have some intuitions as to what we mean by complexity in this context. Someone, I believe it was you, has claimed in this thread that evolution can generate complexity. I assume you meant something other than “Evolution harnesses mutation as a random input and hence as a source of complexity”.
William Dembski is an “intelligent design theorist” (if that is not too much of an oxymoron) who has attempted to define a notion of “specified complexity” or “Complex Specified Information” (CSI). He has not, IMHO, succeeded in defining it clearly, but I think he is onto something. He asserts that biology exhibits CSI. I agree. He asserts that evolution under natural selection is incapable of generating CSI—claiming that NS can at best only transfer information from the environment to the genome. I am pretty sure he is wrong about this, but we need a clear and formal definition of CSI to even discuss the question intelligently.
So, I guess I want to turn your question around. Do you have some definition of “complexity” in mind which allows for correct mathematical thinking about these kinds of issues?
“NS can at best only transfer information from the environment to the genome.” Does this statement mean to suggest that the environment is not complex?
No. As I understand Dembski—at least when he was saying this kind of thing—he admitted that the environment could be complex and hence that NS could instill complexity in evolved organisms. “But”, he then suggested, “where did the complexity of the environment come from, if not from a Designer who crafted an environment capable of directing the evolution of man (in His own image, etc.)”
Dembski, these days, admits to being a YEC, but the reason he is a YEC is based on a kind of appeal to Occam. “If we believe in God anyways, for reasons of Theistic Evolution”, he seems to argue, “Why not take God at His word and believe in 6 days and the whole schtick?”
Do you have some definition of “complexity” in mind which allows for correct mathematical thinking about these kinds of issues?
Not in the context of this conversation (since genetic information stops increasing after a while and goes on optimizing under more or less the same ‘complexity’; ‘fitness’ is closer, although is a moving target), but in about the same sense I don’t have a definition of ‘aging’ that allows “correct mathematical thinking” about it.
Hull, D. L. 1988. Science as a Process. An Evolutionary Account of the Social and Conceptual Development of Science. The University of Chicago Press, Chicago and London, 586 pp.
However, referring to a book without giving an annotation for why it’s relevant is definitely an incorrect way to argue (even if a convincing argument is contained therein).
Disputes about the definition of “evolution”? I don’t think there are too many of those. Mark Ridley is the main one that springs to mind, but his definition is pretty crazy, IMHO.
Why the book is relevant appears to be already being made pretty explicit in the subtitle: “An Evolutionary Account of the Social and Conceptual Development of Science”.
Designers can well design things more complicated than they are.
Agreed. Also, there is a continuum from pure evolution (with no foresight at all) to evaluation of potential designs with varying degrees of sophistication before fabricating them. (I know that I’m recalling this from a post somewhere on this site—please excuse the absence of proper credit assignment.) An example of a dumb process which is marginally smarter than evolution is to take mutation plus recombination and then do a simple gradient search to the nearest local optimum before evaluating the design.
Also, there is a continuum from pure evolution (with no foresight at all) to evaluation of potential designs with varying degrees of sophistication before fabricating them.
I’ll add that evolution with DNA and sexual reproduction already in place fits on a different part of this continuum from evolution of the simplest replicators.
Designers can guide evolution but it is still evolution that creates novelty.
Whatever intelligence is, it can’t be intelligent all the way down. It’s just dumb stuff at the bottom. — Andy Clark
Intelligence is a process facilitated by evolution. Even an AGI making perfect use of some of our most novel algorithms wouldn’t come up with something novel without evolution. See Bayesian Methods and Universal Darwinism.
No; you are invoking the theory of evolution to give that credibility. Even post-Darwin, most people don’t believe this is true. (Remember the Star Trek episode where Spock deduced something about a chess-playing computer, because “the computer could not play chess better than its programmer”?)
The religious advocates of Design explicitly denied this possibility; thus, their design story can’t invoke it.
I believe his point to be that an argument, to be effective, must be convincing to people who are not already convinced. Your argument offered the fact that evolution can design things more complicated than itself as an example with which to counter an anti-evolutionist argument. It therefore succeeds in convincing no one who was not already convinced.
It is not useless to demonstrate that you do not accept a premise rather than (as assumed) being unable see the obvious logical consequences of said premise. It would lead them to disagree for slightly different reasons. If any part of such conversation is about sharing understanding and seeking to communicate information then Vladmir’s comment is, in fact, rather useful.
(No, it will not convince anyone who wasn’t already convinced. But that is because people are just not convinced about religion by argument ever.)
Design is not a competitor to the theory of evolution. Evolution explains how complexity can increase. Design does not
Evolution includes intelligent design these days, and it explains much—for example genetically engineered plants, television sets and suspension bridges.
Tim, I know and you know that your use of the phrase “intelligent design” is not meant to include supernatural designers. Most other people don’t know that, and hence react negatively. Since you expect this response, that makes you a troll (some unnamed sub-species of troll—I hope a vanishing sub-species!)
“Intelligent design” refers any designers who are intelligent, in my book—supernatural or not.
It is true that I don’t use “intelligent design” as an abbreviation for “the hypothesis that an intelligent designer created most organic beings”. That abbreviation is basically a misuse of terminology—and needs killing off.
Using terminology the only way it’s ever been used seldom causes as much terminological confusion, as singlehandedly trying to change it (without warning people what you’re doing).
Evolutionary theory doesn’t explain all possible outcomes. Even after accounting for cultural evolution, it predicts small changes, and observable descent with modification.
I agree with your analysis, though it’s not clear to me what you think of the 1% estimate. I think the 1% estimate is probably two to three orders of magnitude too high and I think the cost of the Scary Idea belief is structured as both a finite loss and an infinite loss, which complicates the analysis in a way not considered. (i.e. the error you see with a Pascal’s mugging is present here.)
For example, I am not particularly tied to a human future. I would be willing to create an AGI in any of the following three situations, ordered from most preferred to least: 1) it is friendly to humans, and humans and it benefit from each other; 2) it considers humans a threat, and destroys all of them except for me and a few tame humans; I spend the rest of my days growing cabbage with my hands; 3) it considers all humans a threat, and destroys them all, including me.
A problem with believing the Scary Idea is it makes it more probable that I beat you to making an AGI; particularly with existential risks, caution can increase your chance of losing. (One cautious way to deal with global warming, for example, is to wait and see what happens.)
So, the Scary Idea as I’ve seen it presented definitely privileges a hypothesis in a troubling way.
I think you’re making the unwarranted assumption that in scenario (3), the AGI then goes on to do interesting and wonderful things, as opposed to (say) turning the galaxy into a vast computer to calculate digits of pi until the heat death of the universe stops it.
You don’t even see such things as a possibility, but if you programmed an AGI with the goal of calculating pi, and it started getting smarter… well, the part of our thought-algorithm that says “seriously, it would be stupid to devote so much to doing that” won’t be in the AI’s goal system unless we’ve intentionally put something there that includes it.
I think you’re making the unwarranted assumption that in scenario (3), the AGI then goes on to do interesting and wonderful things, as opposed to (say) turning the galaxy into a vast computer to calculate digits of pi until the heat death of the universe stops it.
So, I think it’s a possibility. But one thing that bothers me about this objection is that an AGI is going to be, in some significant sense, alien to us, and that will almost definitely include its terminal values. I’m not sure there’s a way for us to judge whether or not alien values are more or less advanced than ours. I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans. I can think of metrics to pick, but they sound like rationalizations rather than starting points.
(And insisting on FAI, instead of on transcendent AI that may or may not be friendly, is essentially enslaving AI- but outsourcing the task to them, because we know we’re not up to the job. Whether or not that’s desirable is hard to say: even asking that question is difficult to do in an interesting way.)
The concept of a utility function being objectively (not using the judgment of a particular value system) more advance than another is incoherent.
I would recommend phrasing objections as questions: people are much more kind about piercing questions than piercing statements. For example, if you had asked “what value system are you using to measure advancement?” then I would have leapt into my answer (or, if I had none, stumbled until I found one or admitted I lacked one). My first comment in this tree may have gone over much better if I phrased it as a question- “doesn’t this suffer from the same failings as Pascal’s wager, that it only takes into account one large improbable outcome instead of all of them?”- than a dismissive statement.
Back to the issue at hand, perhaps it would help if I clarified myself: I consider it highly probable that value drift is inevitable, and thus spend some time contemplating the trajectory of values / morality, rather than just their current values. The question of “what trajectory should values take?” and the question “what values do/should I have now?” are very different questions, and useful for very different situations. When I talk about “advanced,” I am talking about my trajectory preferences (or perhaps predictions would be a better word to use).
For example, I could value my survival, and the survival of the people I know very strongly. Given the choice to murder everyone currently on Earth and repopulate the Earth with a species of completely rational people (perhaps the murder is necessary because otherwise they would be infected by our irrationality), it might be desirable to end humanity (and myself) to move the Earth further along the trajectory I want it to progress along. And maybe, when you take sex and status and selfishness out of the equation, all that’s left to do is calculate pi- a future so boring to humans that any human left in it would commit suicide, but deeply satisfying to the rational life inhabiting the Earth.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
There’s a sense in which I do want values to drift in a direction currently unpredictable to me: I recognize that my current object-level values are incoherent, in ways that I’m not aware of. I have meta-values that govern such conflicts between values (e.g. when I realize that a moral heuristic of mine actually makes everyone else worse off, do I adapt the heuristic or bite the bullet?), and of course these too can be mistaken, and so on.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction. I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction.
I’m assuming by random you mean “chosen uniformly from all possible outcomes”- and I agree that would be undesirable. But I don’t think that’s the choice we’re looking at.
I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
Here we run into a few issues. Depending on how we define the terms, it looks like the two of us could be conflicting on the meta-meta-values stage; is there a meta-meta-meta-values stage to refer to? And how do we decide what “humanity’s” values are, when our individual values are incredibly hard to determine?
Do the meta-values and the meta-meta-values have some coherent source? Is there some consistent root to all the flux in your object-level values? I feel like the crux of FAI feasibility rests on that issue.
I wonder whether all this worrying about value stability isn’t losing sight of exactly this point—just whose values we are talking about.
As I understand it, the friendly values we are talking about are supposed to be some kind of cleaned up averaging of the individual values of a population—the species H. sapiens. But as we ought to know from the theory of evolution, the properties of a population (whether we are talking about stature, intelligence, dentition, or values) are both variable within the population and subject to evolution over time. And that the reason for this change over time is not that the property is changing in any one individual, but rather that the membership in the population is changing.
In my opinion, it is a mistake to try to distill a set of essential values characteristic of humanity and then to try to freeze those values in time. There is no essence of humanity, no fixed human nature. Instead, there is an average (with variance) which has changed over evolutionary time and can be expected to continue to change as the membership in humanity continues to change over time. Most of the people whose values we need to consult in the next millennium have not even been born yet.
A preemptive caveat and apology: I haven’t fully read up everything on this site regarding the issue of FAI yet.
But something I’m wondering about: why all the fuss about creating a friendly AI, instead of a subservient AI? I don’t want an AI that looks after my interests: I’m an adult and no longer need a daycare nurse. I want an AI that will look after my interests AND obey me—and if these two come into conflict, and I’ve become aware of such conflict, I’d rather it obey me.
Isn’t obedience much easier to program in than human values? Let humans remain the judges of human values. Let AI just use its intellect to obey humans.
It will ofcourse become a dreadful weapon of war, but that’s the case with all technology. It will be a great tool of peacetime as well.
There are three kinds of genies: Genies to whom you can safely say “I wish for you to do what I should wish for”; genies for which no wish is safe; and genies that aren’t very powerful or intelligent. ... With a safe genie, wishing is superfluous. Just run the genie.
That is actually one of the articles I have indeed read: but I didn’t find it that convincing because the human could just ask the genie to describe in advance and in detail the manner in which the genie will behave to obey the man’s wishes—and then keep telling him “find another way” until he actually likes the course of action that the genie describes.
Eventually the genie will be smart enough that it will start by proposing only the courses of action the human would find acceptable—but in the meantime there won’t be much risk, because the man will always be able to veto the unacceptables courses of action.
In short the issue of “safe” vs “unsafe” only really comes when we allow genie unsupervised and unvetoed action. And I reckon that humanity WILL be tempted to allow AIs unsupervised and unvetoed action (e.g. because of cases where AIs could have saved children from burning buildings, but they couldn’t contact humans qualified to authorize them to do so), and that’ll be a dreadful temptation and risk.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
I was going to say that if you can’t trust subordinates, you might as well not have them, but that’s an exaggeration—tools can be very useful. It’s fine that a crane doesn’t have the capacity for independent action, it’s still very useful for lifting heavy objects. [1]
In some ways, you get more safety by doing IA (intelligence augmentation), but while people are probably Friendly (unlikely to destroy the human race), they’re not reliably friendly.
[1] For all I know, these days the taller cranes have an active ability to rebalance themselves. If so, that’s still very limited unsupervised action.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
That’s only true if you (the supervisor) know how to perform the task yourself. However, there are a great many tasks that we don’t know how to do, but could evaluate the result if the AI did them for us. We could ask it to prove P!=NP, to write provably correct programs, to design machines and materials and medications that we could test in the normal way that we test such things, etc.
I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans.
Right. But when you, as a human being with human preferences, decide that you wouldn’t stand in a way of an AGI paperclipper, you’re also using human preferences (the very human meta-preference for one’s preferences to be non-arbitrary), but you’re somehow not fully aware of this.
To put it another way, a truly Paperclipping race wouldn’t feel a similarly reasoned urge to allow a non-Paperclipping AGI to ascend, because “lack of arbitrariness” isn’t a meta-value for them.
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”, or whether you just believe that you should feel this way when you learn that human preference isn’t written into the cosmos after all. (Because the latter is a mistake, as you realize when you try and unpack that “should” in a non-human-preference-dependent way.)
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”,
That isn’t what I feel, by the way. It matters to me which way the future turns out; I am just not yet certain on what metric to compare the desirability to me of various volumes of future space. (Indeed, I am pessimistic on being able to come up with anything more than a rough sketch of such a metric.)
I mean, consider two possible futures: in the first, you have a diverse set of less advanced paperclippers (some want paperclips, others want staples, and so on). How do you compare that with a single, more technically advanced paperclipper? Is it unambiguously obvious the unified paperclipper is worse than the diverse group, and that the more advanced is worse than the less advanced?
When you realize that humanity are paperclippers designed by an idiot, it makes the question a lot more difficult to answer.
I think the 1% estimate is probably two to three orders of magnitude too high
I think that “uFAI paperclips us all” set to one million negative utilons is three to four orders of magnitude too low. But our particular estimates should have wide error bars, for none of us have much experience in estimating AI risks.
the cost of the Scary Idea belief is structured as both a finite loss and an infinite loss
It’s a finite loss (6.8x10^9 multiplied by loss of 1 human life) but I definitely understand why it looks infinite: it is often presented as the biggest possible finite loss.
That’s part and parcel of the Scary Idea—that AI is one small field, part of a very select category of fields, that actually do carry the chance of biggest loss possible. The Scary Idea doesn’t apply to most areas, and in most areas you don’t need hyperbolic caution. Developing drugs, for example: You don’t need a formal proof of the harmlessness of this drug, you can just test it on rats and find out. If I suggested that drug development should halt until I have a formal proof that, when followed, cannot produce harmful drugs, I’d be mad. But if testing it on rats would poison all living things, and if a complex molecular simulation inside a computer could poison all living things as well, and out of the vast space of possible drugs, most of them would be poisonous… well, the caution would be warranted.
I would be willing to create an AGI in any of the following three situations, ordered from most preferred to least: 1) it is friendly to humans, and humans and it benefit from each other; 2) it considers humans a threat, and destroys all of them except for me and a few tame humans; I spend the rest of my days growing cabbage with my hands; 3) it considers all humans a threat, and destroys them all, including me.
Would you be willing to fire a gun in any of the following three situations, from most preferred to least preferred: 1) it is pointed at a target, and hitting the target will benefit you? 2) it is pointed at another human, and would kill them but not you? 3) it is pointed at your own head, and would destroy you?
I am not particularly tied to a human future.
I don’t think you actually hold this view. It is logically inconsistent with practices like eating food.
I don’t think you actually hold this view. It is logically inconsistent with practices like eating food.
It might not be. He has certain short term goals of the form “while I’m alive, I’d like to do X” that’s very different from goals connected to the general success of humanity.
Ooops, logically inconsistent was way too strong. I got carried away with making a point. I was reasoning that: “eat food” is a evolutionary drive; “produce descendants that survive” is also an evolutionary drive; “a human future” wholly contains futures where his descendants survive. From that I concluded that it is unlikely he has no evolutionary drives—I didn’t consider the possibility that he is missing some evolutionary drives, including all ones that require a human future—and therefore he is tied to a human future, but finds it expedient for other reasons (contrarian signaling, not admitting defeat in an argument) to claim he doesn’t.
It’s a finite loss (6.8x10^9 multiplied by loss of 1 human life) but I definitely understand why it looks infinite:
I should have been more clear: I mean, if we believe in the scary idea, there are two effects:
Some set of grandmas die. (finite, comparatively small loss)
Humanity is more likely to go extinct due to an unfriendly AGI. (infinite, comparatively large loss; infinite because of the future humans that would have existed but don’t.)
Now, the benefit of believing the Scary Idea is that humanity is less likely to go extinct due to an unfriendly AGI- but my point is that you are not wagering on separate scales (low chance of infinite gain? Sign me up!) but that you are wagering on the same scale (an unfriendly AGI appears!), and the effects of your wager are unknown.
“produce descendants that survive” is also an evolutionary drive
And who said anything about those descendants having to be human?
This answers your other question: yes, I would be willing to have children normally, I would be willing to kill to protect my children, and I would be willing to die to protect my children.
The best-case scenario is that we can have those children and they respect (though they surpass) their parents- the worst-case scenario is we die in childbirth. But all of those are things I can be comfortable with.
(I will note that I’m assuming here the AGI surpasses us. It’s not clear to me that a paperclip-maker does, but it is clear to me that there can be an AGI who is unfriendly solely because we are inconvenient and does surpass us. So I would try and make sure it doesn’t just focus on making paperclips, but wouldn’t focus too hard on making sure it wants me to stick around.)
The best-case scenario is that we can have those children and they respect (though they surpass) their parents- the worst-case scenario is we die in childbirth. But all of those are things I can be comfortable with.
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are. And you said you are willing to kill to protect your children. You think some of the Scary Idea proponents could be parents with children, and they don’t want to see their kids die because you gave birth to an AI?
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are. And you said you are willing to kill to protect your children. You think some of the Scary Idea proponents could be parents with children, and they don’t want to see their kids die because you gave birth to an AI?
I suspect we are at most one more iteration from mutual understanding; we certainly are rapidly approaching it.
If you believe that an AGI will FOOM, then all that matters is the first AGI made. There is no prize for second place. A belief in the Scary Idea has two effects: it makes your AGI more likely to be friendly (since you’re more careful!) and it makes the AGI less likely to be your AGI (since you’re more careful).
Now, one can hope that the Scary Idea meme’s second effect won’t matter, because the meme is so infectious- all you need to do is infect every AI researcher in the world, and now everyone will be more careful and no one will have a carefulness speed disadvantage. But there are two bits of evidence that make that a poor strategy: AI researchers who are familiar with the argument and don’t buy it, and people who buy the argument, but plan to use it to your disadvantage (since now they’re more likely to define the future than you are!).
The scary idea as a technical argument is weighted on unknown and unpredictable values, and the underlying moral argument (to convince someone they should adopt this reasoning) requires that they believe they should weight the satisfaction of other humans more than their ability to define the future, which is a hard sell.
Thus, my statement is, if you care about your children / your ability to define the future / maximizing the likelihood of a friendly AGI / your personal well-being, then believing in the Scary Idea seems counterproductive.
Ok, holy crap. I am going to call this the Really Scary Idea. I had not thought there could be people out there who would actually value being first with the AGI over decreasing the risk of existential disaster, but it is entirely plausible. Thank you for highlighting this for me, I really am grateful. If a little concerned.
Mind projection fallacy, perhaps? I thought the human race was more important than being the guy who invented AGI, so everyone naturally thinks that?
To reply to my own quote, then:
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are.
It doesn’t matter what you are comfortable with, if the developer doesn’t have a term in their utility function for your comfort level. Even I have thought similar thoughts with regards to Luddites and such; drag them kicking and screaming into the future if we have to, etc.
I think the best way to think about it, since it helps keep the scope manageable and crystallize the relevant factors, is that it’s not “being first with the AGI” but “defining the future” (the first is the instrumental value, the second is the terminal value). That’s essentially what all existential risk management is about- defining the future, hopefully to not include the vanishing of us / our descendants.
But how you want to define the future- i.e. the most political terminal value you can have- is not written on the universe. So the mind projection fallacy does seem to apply.
The thing that I find odd, though I can’t find the source at the moment (I thought it was Goertzel’s article, but I didn’t find it by a quick skim; it may be in the comments somewhere), is that the SIAI seems to have had the Really Scary Idea first (we want Friendly AI, so we want to be the first to make it, since we can’t trust other people) and then progressed to the Scary Idea (hmm, we can’t trust ourselves to make a Friendly AI). I wonder if the originators of the Scary Idea forgot the Really Scary Idea or never feared it in the first place?
Making a superintelligence you don’t want before you make the superintelligence you do want, has the same consequences as someone else building a superintelligence you don’t want before you build the superintelligence you do want.
You might argue that you could make a less bad superintelligence that you don’t want than someone else, but we don’t care very much about the difference between tiling the universe with paperclips and tiling the universe with molecular smiley faces.
I’m sorry, but I extracted no novel information from this reply. I’m aware that FAI is a non-trivial problem, and I think work done on making AI more likely to be FAI has value.
But that doesn’t mean believing the Scary Idea, or discussing the Scary Idea without also discussing the Really Scary Idea, decreases the existential risk involved. The estimations involved have almost no dependence on evidence, and so it’s just comparison of priors, which does not seem sufficient to make a strong recommendation.
It may help if you view my objections as pointing out that the Scary Idea is privileging a hypothesis, not that the Scary Idea is something we should ignore.
No. Expecting a superintelligence to optimize for our specific values would be privileging a hypothesis. The “Scary Idea” is saying that most likely something else will happen.
I may have to start only writing thousand-word replies, in the hopes that I can communicate more clearly in such a format.
There are two aspects to the issue of how much work should be put into FAI as I understand it. The first I word like this- “the more thought we put into whether or not an AGI will be friendly, the more likely the AGI will be friendly.” The second I word like this- “the more thought we put into making our AGI, the less likely our AGI will be the AGI.” Both are wrapped up in the Scary Idea- the first part is it as normally stated, the second part is its unstated consequence. The value of believing the Scary Idea is the benefit of the first minus the cost of the second.
My understanding is that we have no good estimation of the value of the first aspect or the second aspect. This isn’t astronomy where we have a good idea of the number of asteroids out there and a pretty good idea of how they move through space. And so, to declare that the first aspect is stronger without evidence strikes me as related to privileging the hypothesis.
(I should note that I expect, without evidence, the problem of FAI to be simpler than the problem of AGI, and thus don’t think the Scary Idea has any policy implications besides “someone should work on FAI.” The risk that AGI gets solved before FAI means more people should work on FAI, not that less people should work on AGI.)
Expecting a superintelligence to optimize for our specific values would be privileging a hypothesis. The “Scary Idea” is saying that most likely something else will happen.
That is not exactly what Goertzel meant by “Scary Idea”. He wrote:
Roughly, the Scary Idea posits that: If I or anybody else actively trying to build advanced AGI succeeds, we’re highly likely to cause an involuntary end to the human race.
It seems to me that there may be a lot of wiggle room in between failing to “optimize for our specific values” and causing “an involuntary end to the human race”. The human race is not so automatically so fragile that it can only survive under the care of a god constructed in our own image.
The problem with Pascal’s Wager isn’t that it’s a Wager. The problem with Pascal’s Wager and Pascal’s Mugging (its analogue in finite expected utility maximization), as near as I can tell, is that if you do an expected utility calculation including one outcome that has a tiny probability but enough utility or disutility to weigh heavily in the calculation anyway, you need to include every possible outcome that is around that level of improbability, or you are privileging a hypothesis and are probably making the calculation less accurate in the process. If you actually are including every other hypothesis at that level of improbability, for instance if you are a galaxy-sized Bayesian superintelligence who, for reasons beyond my mortal mind’s comprehension, has decided not to just dismiss those tiny possibilities a priori anyway, then it still shouldn’t be any problem; at that point, you should get a sane, nearly-optimal answer.
So, is this situation a Pascal’s Mugging? I don’t think it is. 1% isn’t at the same level of ridiculous improbability as, say, Yahweh existing, or the mugger’s threat being true. 1% chances actually happen pretty often, so it’s both possible and prudent to take them into account when a lot is at stake. The only extra thing to consider is that the remaining 99% should be broken down into smaller possibilities; saying “1% humanity ends, 99% everything goes fine” is unjustified. There are probably some other possible outcomes that are also around 1%, and perhaps a bit lower, and they should be taken into account individually.
Excellent analysis. In fairness to Pascal, I think his available evidence at the time should have lead him to attribute more than a 1% chance to the Christian Bible being true.
Indeed. Before Darwin, design was a respectable-to-overwhelming hypothesis for the order of the natural world.
ETA: On second thought, that’s too strong of a claim. See replies below.
Is that true? If we went back in time to before Darwin and gave a not-already-religious person (if we could find one) a thorough rationality lesson — enough to skillfully weigh the probabilities of competing hypotheses (including enough about cognitive science to know why intelligence and intentionality are not black boxes, must carry serious complexity penalties, and need to make specific advance predictions instead of just being invoked as “God wills it” retroactively about only the things that do happen), but not quite enough that they’d end up just inventing the theory of evolution themselves — wouldn’t they conclude, even in the absence of any specific alternatives, that design was a non-explanation, a mysterious answer to a mysterious question? And even imagining that we managed to come up with a technical model of an intelligent designer, specifying in advance the structure of its mind and its goal system, could it actually compress the pre-Darwin knowledge about the natural world more than slightly?
Dawkins actually brings this up in The Blind Watchmaker (page 6 in my copy). Hume is given as the example of someone who said “I don’t have an answer” before Darwin, and Dawkins describes it as such:
Hume’s Dialogues Concerning Natural Religion are definitely worth a read. And I think that Dawkins has it right: Hume really wanted a naturalistic explanation of apparent design in nature, and expected that such an explanation might be possible (even to the point of offering some tentative speculations), but he was honest enough to admit that he didn’t have an explanation at hand.
As pointed out below, Hume is a good counterexample to my thesis above.
On the other hand, there wasn’t a whole lot of honest, systematic searching for other hypotheses before Darwin either.
I didn’t really mean because of Darwin. Design is not a competitor to the theory of evolution. Evolution explains how complexity can increase. Design [ADDED: as an alternative to evolution] does not; it requires a designer that is assumed to be more complicated than the things it designs. Design explains nothing.
Designers can well design things more complicated than they are. (If even evolution without a mind can do so, designers do that easily.)
Agree. One way to look at it is that a designer can take a large source of complexity (whatever its brain is running on) and reshape and concentrate it into an area that is important to it. The complexity of the designer itself isn’t important. Evolution does much the same thing.
I thought that the advance of scientific knowledge is an evolutionary process?
It is, literally. Although the usage of the term ‘evolution’ in this context has itself evolved such that has different, far narrower meaning here.
The term “evolution” usually means what it says in the textbooks on the subject.
They essentially talk about changes in the genetic make up of a population over time.
Science evolves in precisely that sense—e.g. see:
http://en.wikipedia.org/wiki/Dual_inheritance_theory
I stand by my statement, leaving it unchanged.
Don’t see how this remark is relevant, but here’s a reply:
http://lesswrong.com/lw/l6/no_evolutions_for_corporations_or_nanodevices/
The main point of that post is clearly correct, but I think the example of corporations is seriously flawed. It fails to appreciate the extent to which successful business practices consists of informal, non-systematic practical wisdom accumulated through long tradition and selected by success and failure in the market, not conscious a priori planning. The transfer of these practices is clearly very different from DNA-based biological inheritance, but it still operates in such ways that a quasi-Darwinian process can take place.
Applying similar analysis to modern science would be a fascinating project. In my opinion, a lot of the present problems with the proliferation of junk science stem not from intentional malice and fraud, but from a similar quasi-Darwinian process fueled by the fact that practices that best contribute to one’s career success overlap only partly with those that produce valid science. (And as in the case of corporations, the transfer of these practices is very different from biological inheritance, but still permits quasi-Darwinian selection for effective practices.)
The post is a denial of cultural evolution. For the correct perspective, see: Not By Genes Alone: How Culture Transformed Human Evolution by Peter J. Richerson and Robert Boyd.
I’d like to inquire about the difference between evolution and design regarding the creation of novelty. I don’t see how any intelligence can come up with something novel that would allow it to increase complexity if not by the process of evolution.
Noise is complexity. Complexity is easy to increase. Evolutionary designs are interesting not because of their complexity.
If your definition of complexity says noise is complexity, then you need a new definition of complexity.
Yes, many useful definitions, like entropy measures or Kolmogorov complexity, say noise is complexity. But people studying complexity recognize that this is a problem. They are aware that the phenomenon they’re trying to get at when they say “complexity” is something different.
And that concept of “complexity” is probably too complex to be captured by a fundamental notions such as K-complexity.
Well, I’m just trying to figure out what you tried to say when you replied to PhilGoetz:
Yes, but not without evolution. All that design adds to evolution is guidance. That is, if you took away evolution (this includes science and Bayesian methods) a designer could never design things more complicated (as in novel, as in better) than itself.
N designers, each of complexity K, can collectively design something of maximum complexity NK, simply by dividing up the work.
Co-evolution, which may be thought of as a pair of designers interacting through their joint design product, and with an unlimited random stream as supplementary input, can result in very complex designs as well as in the designers themselves becoming more complex through information acquired in the course of the interaction.
It is amusing to look at the Roman Catholic theology of the Trinity, with this kind of consideration in mind. As I remember it, the Deity was “originally” a unipartite, simple God, who then became more complex by contemplating Himself and then further contemplating that Contemplation.
For this reason, I have never been all that impressed by the “refutation” of the first cause argument; the refutation being that it supposedly requires a complex “first cause” God, Who is Himself in need of explanation. God could conceivably have been simple (as simple as a Big Bang, anyways) and then developed (some people would prefer to say “evolved”) under His own internal dynamics into something much more complex. Just as we atheists claim happened to the physical universe.
Adapted refutation: if you’re going to suppose a complex God evolving from a simpler one and then acting on the universe, it is simpler to suppose a complex universe evolving from a simple one. The refutation still holds based on Occam’s razor.
Good point. Agreed.
That simple “God” is the “God” of evolutionary theory. The “first mover” theory does require a complex first cause. It was made in ignorance of evolution, and assumes that a complex design requires an intelligent designer. Every last one of the defenders of the design theory denies that what you say is possible.
Quite possibly. That doesn’t mean I have to agree with them.
What does it mean, exactly? (What’s ‘complexity’? What’s ‘something’ that can be ‘designed’?) Why do you believe it?
I was thinking in terms of Kolmogorov complexity. A Turing program generates an output string of complexity no greater than the size K of the program. Collectively, N different such Turing programs (plus a little glue logic) can generate a string of complexity NK.
If you have observations, that is source of randomness, you can generate output of arbitrary complexity.
Now, let’s step back and look at the whole picture. We were discussing a notion of ‘complexity’ such that evolved organisms gradually became more ‘complex’, and ‘designers’ which are themselves agents, possibly even evolved organisms, that can ‘design’ new things. We then consider that notion of ‘complexity’ as applied to ‘designers’ and ‘designs’ they can produce.
When informal notions are formalized, these formalizations should at least approximately relate to the original informal notions, otherwise we are changing the topic by bringing up these ‘formalizations’ and not actually making progress on understanding the original informal question.
K-complexity is something possessed by random noise. This notion does not reflect the measure of things by which evolution produced more ‘complex’ things than existed before (even if the ‘things’ produced by evolution are more K-complex than their early predecessors). And designers typically have access to randomness, which makes your model of ‘designers’ as programs without input wrong as well, hence conclusion about K-complexity of output incorrect, on top of K-complexity not adequately modeling the informal ‘complexity’.
All very true. Which is one reason I dislike all talk of “complexity”—particularly in such a fuzzy context as debates with creationists.
But we do all have some intuitions as to what we mean by complexity in this context. Someone, I believe it was you, has claimed in this thread that evolution can generate complexity. I assume you meant something other than “Evolution harnesses mutation as a random input and hence as a source of complexity”.
William Dembski is an “intelligent design theorist” (if that is not too much of an oxymoron) who has attempted to define a notion of “specified complexity” or “Complex Specified Information” (CSI). He has not, IMHO, succeeded in defining it clearly, but I think he is onto something. He asserts that biology exhibits CSI. I agree. He asserts that evolution under natural selection is incapable of generating CSI—claiming that NS can at best only transfer information from the environment to the genome. I am pretty sure he is wrong about this, but we need a clear and formal definition of CSI to even discuss the question intelligently.
So, I guess I want to turn your question around. Do you have some definition of “complexity” in mind which allows for correct mathematical thinking about these kinds of issues?
“NS can at best only transfer information from the environment to the genome.” Does this statement mean to suggest that the environment is not complex?
No. As I understand Dembski—at least when he was saying this kind of thing—he admitted that the environment could be complex and hence that NS could instill complexity in evolved organisms. “But”, he then suggested, “where did the complexity of the environment come from, if not from a Designer who crafted an environment capable of directing the evolution of man (in His own image, etc.)”
Dembski, these days, admits to being a YEC, but the reason he is a YEC is based on a kind of appeal to Occam. “If we believe in God anyways, for reasons of Theistic Evolution”, he seems to argue, “Why not take God at His word and believe in 6 days and the whole schtick?”
Not in the context of this conversation (since genetic information stops increasing after a while and goes on optimizing under more or less the same ‘complexity’; ‘fitness’ is closer, although is a moving target), but in about the same sense I don’t have a definition of ‘aging’ that allows “correct mathematical thinking” about it.
A wrong reply—for the correct answer, see:
Hull, D. L. 1988. Science as a Process. An Evolutionary Account of the Social and Conceptual Development of Science. The University of Chicago Press, Chicago and London, 586 pp.
There are no correct answers in a dispute about definitions, only aesthetic judgments and sometimes considerations of the danger of hidden implicit inferences. You can’t use authority in such an argument, unless of course you appeal to common usage.
However, referring to a book without giving an annotation for why it’s relevant is definitely an incorrect way to argue (even if a convincing argument is contained therein).
Disputes about the definition of “evolution”? I don’t think there are too many of those. Mark Ridley is the main one that springs to mind, but his definition is pretty crazy, IMHO.
Why the book is relevant appears to be already being made pretty explicit in the subtitle: “An Evolutionary Account of the Social and Conceptual Development of Science”.
Agreed. Also, there is a continuum from pure evolution (with no foresight at all) to evaluation of potential designs with varying degrees of sophistication before fabricating them. (I know that I’m recalling this from a post somewhere on this site—please excuse the absence of proper credit assignment.) An example of a dumb process which is marginally smarter than evolution is to take mutation plus recombination and then do a simple gradient search to the nearest local optimum before evaluating the design.
I’ll add that evolution with DNA and sexual reproduction already in place fits on a different part of this continuum from evolution of the simplest replicators.
Designers can guide evolution but it is still evolution that creates novelty.
Intelligence is a process facilitated by evolution. Even an AGI making perfect use of some of our most novel algorithms wouldn’t come up with something novel without evolution. See Bayesian Methods and Universal Darwinism.
No; you are invoking the theory of evolution to give that credibility. Even post-Darwin, most people don’t believe this is true. (Remember the Star Trek episode where Spock deduced something about a chess-playing computer, because “the computer could not play chess better than its programmer”?)
The religious advocates of Design explicitly denied this possibility; thus, their design story can’t invoke it.
Incidentally, theory of evolution is true.
I believe his point to be that an argument, to be effective, must be convincing to people who are not already convinced. Your argument offered the fact that evolution can design things more complicated than itself as an example with which to counter an anti-evolutionist argument. It therefore succeeds in convincing no one who was not already convinced.
It would, however, lead them to disagree for slightly different reasons.
I don’t understand your point.
It is not useless to demonstrate that you do not accept a premise rather than (as assumed) being unable see the obvious logical consequences of said premise. It would lead them to disagree for slightly different reasons. If any part of such conversation is about sharing understanding and seeking to communicate information then Vladmir’s comment is, in fact, rather useful.
(No, it will not convince anyone who wasn’t already convinced. But that is because people are just not convinced about religion by argument ever.)
“Believing this statement will make you happier.” -- Ryan Lortie
That’s religion. A fairly good argument.
;-)
Also missing from the world pre-1800: any understanding of complexity, entropy, etc.
Evolution includes intelligent design these days, and it explains much—for example genetically engineered plants, television sets and suspension bridges.
Tim, I know and you know that your use of the phrase “intelligent design” is not meant to include supernatural designers. Most other people don’t know that, and hence react negatively. Since you expect this response, that makes you a troll (some unnamed sub-species of troll—I hope a vanishing sub-species!)
Why do you persist in doing this?
Piling on and downvoting.
“Intelligent design” refers any designers who are intelligent, in my book—supernatural or not.
It is true that I don’t use “intelligent design” as an abbreviation for “the hypothesis that an intelligent designer created most organic beings”. That abbreviation is basically a misuse of terminology—and needs killing off.
Using terminology the only way it’s ever been used seldom causes as much terminological confusion, as singlehandedly trying to change it (without warning people what you’re doing).
(Obligatory reply): Yes, and stretched that far, it also explains non-plants, non-TV sets, and non-suspension bridges. That’s the problem.
Evolutionary theory doesn’t explain all possible outcomes. Even after accounting for cultural evolution, it predicts small changes, and observable descent with modification.
I agree with your analysis, though it’s not clear to me what you think of the 1% estimate. I think the 1% estimate is probably two to three orders of magnitude too high and I think the cost of the Scary Idea belief is structured as both a finite loss and an infinite loss, which complicates the analysis in a way not considered. (i.e. the error you see with a Pascal’s mugging is present here.)
For example, I am not particularly tied to a human future. I would be willing to create an AGI in any of the following three situations, ordered from most preferred to least: 1) it is friendly to humans, and humans and it benefit from each other; 2) it considers humans a threat, and destroys all of them except for me and a few tame humans; I spend the rest of my days growing cabbage with my hands; 3) it considers all humans a threat, and destroys them all, including me.
A problem with believing the Scary Idea is it makes it more probable that I beat you to making an AGI; particularly with existential risks, caution can increase your chance of losing. (One cautious way to deal with global warming, for example, is to wait and see what happens.)
So, the Scary Idea as I’ve seen it presented definitely privileges a hypothesis in a troubling way.
I think you’re making the unwarranted assumption that in scenario (3), the AGI then goes on to do interesting and wonderful things, as opposed to (say) turning the galaxy into a vast computer to calculate digits of pi until the heat death of the universe stops it.
You don’t even see such things as a possibility, but if you programmed an AGI with the goal of calculating pi, and it started getting smarter… well, the part of our thought-algorithm that says “seriously, it would be stupid to devote so much to doing that” won’t be in the AI’s goal system unless we’ve intentionally put something there that includes it.
I make that assumption explicit here.
So, I think it’s a possibility. But one thing that bothers me about this objection is that an AGI is going to be, in some significant sense, alien to us, and that will almost definitely include its terminal values. I’m not sure there’s a way for us to judge whether or not alien values are more or less advanced than ours. I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans. I can think of metrics to pick, but they sound like rationalizations rather than starting points.
(And insisting on FAI, instead of on transcendent AI that may or may not be friendly, is essentially enslaving AI- but outsourcing the task to them, because we know we’re not up to the job. Whether or not that’s desirable is hard to say: even asking that question is difficult to do in an interesting way.)
The concept of a utility function being objectively (not using the judgment of a particular value system) more advance than another is incoherent.
I would recommend phrasing objections as questions: people are much more kind about piercing questions than piercing statements. For example, if you had asked “what value system are you using to measure advancement?” then I would have leapt into my answer (or, if I had none, stumbled until I found one or admitted I lacked one). My first comment in this tree may have gone over much better if I phrased it as a question- “doesn’t this suffer from the same failings as Pascal’s wager, that it only takes into account one large improbable outcome instead of all of them?”- than a dismissive statement.
Back to the issue at hand, perhaps it would help if I clarified myself: I consider it highly probable that value drift is inevitable, and thus spend some time contemplating the trajectory of values / morality, rather than just their current values. The question of “what trajectory should values take?” and the question “what values do/should I have now?” are very different questions, and useful for very different situations. When I talk about “advanced,” I am talking about my trajectory preferences (or perhaps predictions would be a better word to use).
For example, I could value my survival, and the survival of the people I know very strongly. Given the choice to murder everyone currently on Earth and repopulate the Earth with a species of completely rational people (perhaps the murder is necessary because otherwise they would be infected by our irrationality), it might be desirable to end humanity (and myself) to move the Earth further along the trajectory I want it to progress along. And maybe, when you take sex and status and selfishness out of the equation, all that’s left to do is calculate pi- a future so boring to humans that any human left in it would commit suicide, but deeply satisfying to the rational life inhabiting the Earth.
It seems to me that questions along those lines- “how should values drift?” do have immediate answers- “they should stay exactly where they are now / everyone should adopt the values I want them to adopt”- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
There’s a sense in which I do want values to drift in a direction currently unpredictable to me: I recognize that my current object-level values are incoherent, in ways that I’m not aware of. I have meta-values that govern such conflicts between values (e.g. when I realize that a moral heuristic of mine actually makes everyone else worse off, do I adapt the heuristic or bite the bullet?), and of course these too can be mistaken, and so on.
I’d find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity’s values drifted in a random direction. I’d much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
I’m assuming by random you mean “chosen uniformly from all possible outcomes”- and I agree that would be undesirable. But I don’t think that’s the choice we’re looking at.
Here we run into a few issues. Depending on how we define the terms, it looks like the two of us could be conflicting on the meta-meta-values stage; is there a meta-meta-meta-values stage to refer to? And how do we decide what “humanity’s” values are, when our individual values are incredibly hard to determine?
Do the meta-values and the meta-meta-values have some coherent source? Is there some consistent root to all the flux in your object-level values? I feel like the crux of FAI feasibility rests on that issue.
I wonder whether all this worrying about value stability isn’t losing sight of exactly this point—just whose values we are talking about.
As I understand it, the friendly values we are talking about are supposed to be some kind of cleaned up averaging of the individual values of a population—the species H. sapiens. But as we ought to know from the theory of evolution, the properties of a population (whether we are talking about stature, intelligence, dentition, or values) are both variable within the population and subject to evolution over time. And that the reason for this change over time is not that the property is changing in any one individual, but rather that the membership in the population is changing.
In my opinion, it is a mistake to try to distill a set of essential values characteristic of humanity and then to try to freeze those values in time. There is no essence of humanity, no fixed human nature. Instead, there is an average (with variance) which has changed over evolutionary time and can be expected to continue to change as the membership in humanity continues to change over time. Most of the people whose values we need to consult in the next millennium have not even been born yet.
If enough people agree with you (and I’m inclined that way myself), then updating will be built into the CEV.
A preemptive caveat and apology: I haven’t fully read up everything on this site regarding the issue of FAI yet.
But something I’m wondering about: why all the fuss about creating a friendly AI, instead of a subservient AI? I don’t want an AI that looks after my interests: I’m an adult and no longer need a daycare nurse. I want an AI that will look after my interests AND obey me—and if these two come into conflict, and I’ve become aware of such conflict, I’d rather it obey me.
Isn’t obedience much easier to program in than human values? Let humans remain the judges of human values. Let AI just use its intellect to obey humans.
It will ofcourse become a dreadful weapon of war, but that’s the case with all technology. It will be a great tool of peacetime as well.
See The Hidden Complexity of Wishes, for example.
That is actually one of the articles I have indeed read: but I didn’t find it that convincing because the human could just ask the genie to describe in advance and in detail the manner in which the genie will behave to obey the man’s wishes—and then keep telling him “find another way” until he actually likes the course of action that the genie describes.
Eventually the genie will be smart enough that it will start by proposing only the courses of action the human would find acceptable—but in the meantime there won’t be much risk, because the man will always be able to veto the unacceptables courses of action.
In short the issue of “safe” vs “unsafe” only really comes when we allow genie unsupervised and unvetoed action. And I reckon that humanity WILL be tempted to allow AIs unsupervised and unvetoed action (e.g. because of cases where AIs could have saved children from burning buildings, but they couldn’t contact humans qualified to authorize them to do so), and that’ll be a dreadful temptation and risk.
It’s not just extreme cases like saving children without authorization—have you ever heard someone (possibly a parent) saying that constant supervision is more work than doing the task themselves?
I was going to say that if you can’t trust subordinates, you might as well not have them, but that’s an exaggeration—tools can be very useful. It’s fine that a crane doesn’t have the capacity for independent action, it’s still very useful for lifting heavy objects. [1]
In some ways, you get more safety by doing IA (intelligence augmentation), but while people are probably Friendly (unlikely to destroy the human race), they’re not reliably friendly.
[1] For all I know, these days the taller cranes have an active ability to rebalance themselves. If so, that’s still very limited unsupervised action.
That’s only true if you (the supervisor) know how to perform the task yourself. However, there are a great many tasks that we don’t know how to do, but could evaluate the result if the AI did them for us. We could ask it to prove P!=NP, to write provably correct programs, to design machines and materials and medications that we could test in the normal way that we test such things, etc.
Right. But when you, as a human being with human preferences, decide that you wouldn’t stand in a way of an AGI paperclipper, you’re also using human preferences (the very human meta-preference for one’s preferences to be non-arbitrary), but you’re somehow not fully aware of this.
To put it another way, a truly Paperclipping race wouldn’t feel a similarly reasoned urge to allow a non-Paperclipping AGI to ascend, because “lack of arbitrariness” isn’t a meta-value for them.
So you ought to ask yourself whether it’s your real and final preference that says “human preference is arbitrary, therefore it doesn’t matter what becomes of the universe”, or whether you just believe that you should feel this way when you learn that human preference isn’t written into the cosmos after all. (Because the latter is a mistake, as you realize when you try and unpack that “should” in a non-human-preference-dependent way.)
That isn’t what I feel, by the way. It matters to me which way the future turns out; I am just not yet certain on what metric to compare the desirability to me of various volumes of future space. (Indeed, I am pessimistic on being able to come up with anything more than a rough sketch of such a metric.)
I mean, consider two possible futures: in the first, you have a diverse set of less advanced paperclippers (some want paperclips, others want staples, and so on). How do you compare that with a single, more technically advanced paperclipper? Is it unambiguously obvious the unified paperclipper is worse than the diverse group, and that the more advanced is worse than the less advanced?
When you realize that humanity are paperclippers designed by an idiot, it makes the question a lot more difficult to answer.
I think that “uFAI paperclips us all” set to one million negative utilons is three to four orders of magnitude too low. But our particular estimates should have wide error bars, for none of us have much experience in estimating AI risks.
It’s a finite loss (6.8x10^9 multiplied by loss of 1 human life) but I definitely understand why it looks infinite: it is often presented as the biggest possible finite loss.
That’s part and parcel of the Scary Idea—that AI is one small field, part of a very select category of fields, that actually do carry the chance of biggest loss possible. The Scary Idea doesn’t apply to most areas, and in most areas you don’t need hyperbolic caution. Developing drugs, for example: You don’t need a formal proof of the harmlessness of this drug, you can just test it on rats and find out. If I suggested that drug development should halt until I have a formal proof that, when followed, cannot produce harmful drugs, I’d be mad. But if testing it on rats would poison all living things, and if a complex molecular simulation inside a computer could poison all living things as well, and out of the vast space of possible drugs, most of them would be poisonous… well, the caution would be warranted.
Would you be willing to fire a gun in any of the following three situations, from most preferred to least preferred: 1) it is pointed at a target, and hitting the target will benefit you? 2) it is pointed at another human, and would kill them but not you? 3) it is pointed at your own head, and would destroy you?
I don’t think you actually hold this view. It is logically inconsistent with practices like eating food.
It might not be. He has certain short term goals of the form “while I’m alive, I’d like to do X” that’s very different from goals connected to the general success of humanity.
Ooops, logically inconsistent was way too strong. I got carried away with making a point. I was reasoning that: “eat food” is a evolutionary drive; “produce descendants that survive” is also an evolutionary drive; “a human future” wholly contains futures where his descendants survive. From that I concluded that it is unlikely he has no evolutionary drives—I didn’t consider the possibility that he is missing some evolutionary drives, including all ones that require a human future—and therefore he is tied to a human future, but finds it expedient for other reasons (contrarian signaling, not admitting defeat in an argument) to claim he doesn’t.
I should have been more clear: I mean, if we believe in the scary idea, there are two effects:
Some set of grandmas die. (finite, comparatively small loss)
Humanity is more likely to go extinct due to an unfriendly AGI. (infinite, comparatively large loss; infinite because of the future humans that would have existed but don’t.)
Now, the benefit of believing the Scary Idea is that humanity is less likely to go extinct due to an unfriendly AGI- but my point is that you are not wagering on separate scales (low chance of infinite gain? Sign me up!) but that you are wagering on the same scale (an unfriendly AGI appears!), and the effects of your wager are unknown.
And who said anything about those descendants having to be human?
This answers your other question: yes, I would be willing to have children normally, I would be willing to kill to protect my children, and I would be willing to die to protect my children.
The best-case scenario is that we can have those children and they respect (though they surpass) their parents- the worst-case scenario is we die in childbirth. But all of those are things I can be comfortable with.
(I will note that I’m assuming here the AGI surpasses us. It’s not clear to me that a paperclip-maker does, but it is clear to me that there can be an AGI who is unfriendly solely because we are inconvenient and does surpass us. So I would try and make sure it doesn’t just focus on making paperclips, but wouldn’t focus too hard on making sure it wants me to stick around.)
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are. And you said you are willing to kill to protect your children. You think some of the Scary Idea proponents could be parents with children, and they don’t want to see their kids die because you gave birth to an AI?
I suspect we are at most one more iteration from mutual understanding; we certainly are rapidly approaching it.
If you believe that an AGI will FOOM, then all that matters is the first AGI made. There is no prize for second place. A belief in the Scary Idea has two effects: it makes your AGI more likely to be friendly (since you’re more careful!) and it makes the AGI less likely to be your AGI (since you’re more careful).
Now, one can hope that the Scary Idea meme’s second effect won’t matter, because the meme is so infectious- all you need to do is infect every AI researcher in the world, and now everyone will be more careful and no one will have a carefulness speed disadvantage. But there are two bits of evidence that make that a poor strategy: AI researchers who are familiar with the argument and don’t buy it, and people who buy the argument, but plan to use it to your disadvantage (since now they’re more likely to define the future than you are!).
The scary idea as a technical argument is weighted on unknown and unpredictable values, and the underlying moral argument (to convince someone they should adopt this reasoning) requires that they believe they should weight the satisfaction of other humans more than their ability to define the future, which is a hard sell.
Thus, my statement is, if you care about your children / your ability to define the future / maximizing the likelihood of a friendly AGI / your personal well-being, then believing in the Scary Idea seems counterproductive.
Ok, holy crap. I am going to call this the Really Scary Idea. I had not thought there could be people out there who would actually value being first with the AGI over decreasing the risk of existential disaster, but it is entirely plausible. Thank you for highlighting this for me, I really am grateful. If a little concerned.
Mind projection fallacy, perhaps? I thought the human race was more important than being the guy who invented AGI, so everyone naturally thinks that?
To reply to my own quote, then:
It doesn’t matter what you are comfortable with, if the developer doesn’t have a term in their utility function for your comfort level. Even I have thought similar thoughts with regards to Luddites and such; drag them kicking and screaming into the future if we have to, etc.
And… mutual understanding in one!
I think the best way to think about it, since it helps keep the scope manageable and crystallize the relevant factors, is that it’s not “being first with the AGI” but “defining the future” (the first is the instrumental value, the second is the terminal value). That’s essentially what all existential risk management is about- defining the future, hopefully to not include the vanishing of us / our descendants.
But how you want to define the future- i.e. the most political terminal value you can have- is not written on the universe. So the mind projection fallacy does seem to apply.
The thing that I find odd, though I can’t find the source at the moment (I thought it was Goertzel’s article, but I didn’t find it by a quick skim; it may be in the comments somewhere), is that the SIAI seems to have had the Really Scary Idea first (we want Friendly AI, so we want to be the first to make it, since we can’t trust other people) and then progressed to the Scary Idea (hmm, we can’t trust ourselves to make a Friendly AI). I wonder if the originators of the Scary Idea forgot the Really Scary Idea or never feared it in the first place?
Making a superintelligence you don’t want before you make the superintelligence you do want, has the same consequences as someone else building a superintelligence you don’t want before you build the superintelligence you do want.
You might argue that you could make a less bad superintelligence that you don’t want than someone else, but we don’t care very much about the difference between tiling the universe with paperclips and tiling the universe with molecular smiley faces.
I’m sorry, but I extracted no novel information from this reply. I’m aware that FAI is a non-trivial problem, and I think work done on making AI more likely to be FAI has value.
But that doesn’t mean believing the Scary Idea, or discussing the Scary Idea without also discussing the Really Scary Idea, decreases the existential risk involved. The estimations involved have almost no dependence on evidence, and so it’s just comparison of priors, which does not seem sufficient to make a strong recommendation.
It may help if you view my objections as pointing out that the Scary Idea is privileging a hypothesis, not that the Scary Idea is something we should ignore.
No. Expecting a superintelligence to optimize for our specific values would be privileging a hypothesis. The “Scary Idea” is saying that most likely something else will happen.
I may have to start only writing thousand-word replies, in the hopes that I can communicate more clearly in such a format.
There are two aspects to the issue of how much work should be put into FAI as I understand it. The first I word like this- “the more thought we put into whether or not an AGI will be friendly, the more likely the AGI will be friendly.” The second I word like this- “the more thought we put into making our AGI, the less likely our AGI will be the AGI.” Both are wrapped up in the Scary Idea- the first part is it as normally stated, the second part is its unstated consequence. The value of believing the Scary Idea is the benefit of the first minus the cost of the second.
My understanding is that we have no good estimation of the value of the first aspect or the second aspect. This isn’t astronomy where we have a good idea of the number of asteroids out there and a pretty good idea of how they move through space. And so, to declare that the first aspect is stronger without evidence strikes me as related to privileging the hypothesis.
(I should note that I expect, without evidence, the problem of FAI to be simpler than the problem of AGI, and thus don’t think the Scary Idea has any policy implications besides “someone should work on FAI.” The risk that AGI gets solved before FAI means more people should work on FAI, not that less people should work on AGI.)
That is not exactly what Goertzel meant by “Scary Idea”. He wrote:
It seems to me that there may be a lot of wiggle room in between failing to “optimize for our specific values” and causing “an involuntary end to the human race”. The human race is not so automatically so fragile that it can only survive under the care of a god constructed in our own image.
Yes, what I described was not what Goertzel called the “Scary Idea”, but, in context, it describes the aspect of it that we were discussing.