How to get that Friendly Singularity: a minority view
Note: I know this is a rationality site, not a Singularity Studies site. But the Singularity issue is ever in the background here, and the local focus on decision theory fits right into the larger scheme—see below.
There is a worldview which I have put together over the years, which is basically my approximation to Eliezer’s master plan. It’s not an attempt to reconstruct every last detail of Eliezer’s actual strategy for achieving a Friendly Singularity, though I think it must have considerable resemblance to the real thing. It might be best regarded as Eliezer-inspired, or as “what my Inner Eliezer thinks”. What I propose to do is to outline this quasi-mythical orthodoxy, this tenuous implicit consensus (tenuous consensus because there is in fact a great diversity of views in the world of thought about the Singularity, but implicit consensus because no-one else has a plan), and then state how I think it should be amended. The amended plan is the “minority view” promised in my title.
There will be strongly superhuman intelligence in the historically immediate future, unless a civilization-ending technological disaster occurs first.
Implicit assumption: problem-solving entities (natural and artificial intelligences, and coalitions thereof) do possess an attribute, their “general intelligence”, which is both objective and rankable. Theoretical computer science suggests that this is so, but that it takes a lot of conceptual work to arrive at a fully objective definition of general intelligence.
The “historically immediate future” may be taken to mean, as an absolute upper bound, the rest of this century. Personally, I find it hard to see how twenty more years can pass without people being able to make planet-killing nanotechnology, so I give it twenty years maximum before we’re in the endgame.
I specify technological disaster in the escape clause, because a natural disaster sufficient to end civilization is extremely unlikely on this timescale, and it will require a cultural disruption of that order to halt the progression towards superhuman intelligence.
In a conflict of values among intelligences, the higher intelligence will win, so for human values / your values to survive after superintelligence, the best chance is for the seed from which the superintelligence grew to have already been “human-friendly”.
Elementary but very important observation: for at least some classes of intelligence, such as the “expected-utility maximizer” (EUM), values or goals are utterly contingent. The component specifying the utility function is independent of the component which solves the problem of maximizing expected utility, and so literally any goal that can be parsed by the problem solver, no matter how absurd, can become its supreme value, just as a calculator will dutifully attempt to evaluate any expression that you throw at it. The contingency of AI core values means that neither utopia nor dystopia (from a human perspective) is guaranteed—though the latter is far more likely, if the values are specified carelessly.
The seed might be an artificial intelligence, modifying itself, or a natural intelligence modifying itself, or some combination of these. But AI is generally considered to have the advantage over natural intelligence when it comes to self-modification.
The way to produce a human-friendly seed intelligence is to identify the analogue, in the cognitive architecture behind human decision-making, of the utility function of an EUM, and then to “renormalize” or “reflectively idealize” this, i.e. to produce an ideal moral agent as defined with respect to our species’ particular “utility function”.
Human beings are not EUMs, but we do belong to some abstract class of decision-making system, and there is going to be some component of that system which specifies the goals rather than figuring out how to achieve them. That component is the analogue of the utility function.
This ideal moral agent has to have, not just the right values, but the attribute of superhuman intelligence, if its creation is to constitute a Singularity; and those values have to be stable during the period of self-modification which produces increasing intelligence. The solution of these problems—self-enhancement, and ethical stability under self-enhancement—is also essential for the attainment of a Friendly Singularity. But that is basically a technical issue of computer science and I won’t talk further about it.
The truly fast way to produce a human-relative ideal moral agent is to create an AI with the interim goal of inferring the “human utility function” (but with a few safeguards built in, so it doesn’t, e.g., kill off humanity while it solves that sub-problem), and which is programmed to then transform itself into the desired ideal moral agent once the exact human utility function has been identified.
Figuring out the human utility function is a problem of empirical cognitive neuroscience, and if our AI really is a potential superintelligence, it ought to be better at such a task than any human scientist.
I am especially going out on a limb in asserting that this final proposition is part of the master plan, though I think traces of the idea can be found in recent writings. But anyway, it’s a plausible way to round out the philosophy and the research program; it makes sense if you agree with everything else that came before. It’s what my Inner Eliezer thinks.
Commentary
This is, somewhat remarkably, a well-defined research program for the creation of a Friendly Singularity. You could print it out right now and use it as the mission statement of your personal institute for benevolent superintelligence. There are very hard theoretical and empirical problems in there, but I do not see anything that is clearly nonsensical or impossible.
So what’s my problem? Why don’t I just devote the rest of my life to the achievement of this vision? There are two, maybe three amendments I would wish to make. What I call the ontological problem has not been addressed; the problem of consciousness, which is the main subproblem of the ontological problem, is also passed over; and finally, it makes sense to advocate that human neuroscientists should be trying to identify the human utility function, rather than simply planning to delegate that task to an AI scientist.
The problem of ontology and the problem of consciousness can be stated briefly enough: our physics is incomplete, and even worse, our general scientific ontology is incomplete, because inherently and by construction it excludes the reality of consciousness.
The observation that quantum mechanics, when expressed in a form which makes “measurement” an undefined basic concept, does not provide an objective and self-sufficient account of reality, has led on this site to the advocacy of the many-worlds interpretation as the answer. I recently argued that many worlds is not the clear favorite, to a somewhat mixed response, and I imagine that I will be greeted with almost immovable skepticism if I also assert that the very template of natural-scientific reduction—mathematical physics in all its forms—is inherently inadequate for the description of consciousness. Nonetheless, I do so assert. Maybe I will make the case at greater length in a future article. But the situation is more or less as follows. We have invented a number of abstract disciplines, such as logic, mathematics, and computer science, by means of which we find ourselves able to think in a rigorously exact fashion about a variety of abstract possible objects. These objects constitute the theoretical ontology in terms of which we seek to understand and identify the nature of the actual world. I suppose there is also a minimal “worldly” ontology still present in all our understandings of the actual world, whereby concepts such as “thing” and “cause” still play a role, in conjunction with the truly abstract ideas. But this is how it is if you attempt to literally identify the world with any form of physics that we have, whether it’s classical atoms in a void, complex amplitudes stretching across a multiverse configuration space, or even a speculative computational physics, based perhaps on cellular automata or equivalence classes of Turing machines.
Having adopted such a framework, how does one then understand one’s own conscious experience? Basically, through a combination of outright denial with a stealth dualism that masquerades as identity. Thus a person could say, for example, that the passage of time is an illusion (that’s denial) and that perceived qualities are just neuronal categorizations (stealth dualism). I call the latter identification a stealth dualism because it blithely asserts that one thing is another thing when in fact they are nothing like each other. Stealth dualisms are unexamined habitual associations of a bit of physico-computational ontology with a bit of subjective phenomenology which allow materialists to feel that the mind does not pose a philosophical problem for them.
My stance, therefore, is that intellectually we are in a much much worse position, when it comes to understanding consciousness, than most scientists, and especially most computer scientists, think. Not only is it an unsolved problem, but we are trying to solve it in the wrong way: presupposing the desiccated ontology of our mathematical physics, and trying to fit the diversities of phenomenological ontology into that framework. This is, I submit, entirely the wrong way round. One should instead proceed as follows: I exist, and among my properties are that I experience what I am experiencing, and that there is a sequence of such experiences. If I can free my mind from the assumption that the known classes of abstract object are all that can possibly exist, what sort of entity do I appear to be? Phenomenology—self-observation—thereby turns into an ontology of the self, and if you’ve done it correctly (I’m not saying this is easy), you have the beginning of a new ontology which by design accommodates the manifest realities of consciousness. The task then becomes to reconstitute or reinterpret the world according to mathematical physics in a way which does not erase anything you think you established in the phenomenological phase of your theory-building.
I’m sure this program can be pursued in a variety of ways. My way is to emphasize the phenomenological unity of consciousness as indicating the ontological unity of the self, and to identify the self with what, in current physical language, we would call a large irreducible tensor factor in the quantum state of the brain. Again, the objective is not to reduce consciousness to quantum mechanics, but rather to reinterpret the formal ontology of quantum mechanics in a way which is not outright inconsistent with the bare appearances of experience. However, I’m not today insisting upon the correctness of my particular approach (or even trying very hard to explain it); only emphasizing my conviction that there remains an incredibly profound gap in our understanding of the world, and it has radical implications for any technically detailed attempt to bring about a human-friendly outcome to the race towards superintelligence. In particular, all the disciplines (e.g. theoretical computer science, empirical cognitive neuroscience) which play a part in cashing out the principles of a Friendliness strategy would need to be conceptually reconstructed in a way founded upon the true ontology.
Having said all that, it’s a lot simpler to spell out the meaning of my other amendment to the “orthodox” blueprint for a Friendly Singularity. It is advisable to not just think about how to delegate the empirical task of determining the human utility function to an AI scientist, but also to encourage existing human scientists to tackle this problem. The basic objective is to understand what sort of decision-making system we are. We’re not expected utility maximizers; well, what are we then? This is a conceptual problem, though it requires empirical input, and research by merely human cognitive neuroscientists and decision theorists should be capable of producing conceptual progress, which will in turn help us to find the correct concepts which I have merely approximated here in talking about “utility functions” and “ideal moral agents”.
Thanks to anyone who read this far. :-)
- 13 Jan 2010 12:08 UTC; 0 points) 's comment on Consciousness by (
- Let’s make a deal by 23 Sep 2010 0:59 UTC; -22 points) (
Have you ever burned yourself on, say, a hot dish? Typically, people automatically recall such experiences in the following time sequence: simultaneous touch and burn sensation, followed by sharp involuntary withdrawal of limb. This time sequence is a trick our brains play on us: the withdrawal reflex arc is entirely spinal; withdrawal of the limb happens before the damage signal even has time to reach the brain. (After becoming aware of this fact, I had a burn experience in which I seemed to feel my brain trying to switch the time ordering of my perceptions from motion-prior-to-pain to pain-prior-to-motion and consciously intervened in the process to preserve my original perceptions.)
I bring this up because I believe that a brain capable of that trick cannot possibly give an accurate ontology of self through mere self-observation. You might be able to rescue your program by expanding the definition of self-observation to include the findings of modern neuroscience.
I see the ontological role of phenomenology as about establishing the qualitative features of consciousness. Certainly you can combine this first-person data with third-person data. That’s how you get to know that the time-sequence illusion is an illusion. It’s what we’re doing when we try to locate neural correlates of consciousness.
But the potential for error correction ought to go both ways. If you can make a mistake on the basis of ideas derived from first-person observation, you can also do it on the basis of ideas derived from third-person observation. My thesis here is that people feel compelled to embrace certain mistaken conclusions about consciousness because of their beliefs about physical ontology, which derive from third-person observation.
Mind rephrasing what it is you’re actually claiming about consciousness? Near as I can tell, the entire actual core content of that part of what you’re saying amounts to “We need to observe ourselves more and think about consciousness a bit more. Oh, incidentally, there’s stuff we still don’t know about yet.”
I think I may be misreading, though.
I’m not the first to note, but...
[...]
I fail to see where you think the denialism lies, but I’d guess you have misunderstood what it means that passage of time is an illusion. It doesn’t mean we can’t experience passage of time. It doesn’t mean we’re not the kinda entities that experience passage of time. It just means that the stuff we intuitively deduce from our experience there is rubbish. The same way people tend to think that sight is something that goes from the eyes to the object you’re looking at. Erroneous deduction from very real experience.
Maybe elaborating on your views might help? It seems that inferential distance is kinda big, at least too big for me to handle
A few points:
This is wrong from the beginning: a measurement simply refers to the creation of mutual information, which is indeed well-defined. In quantum-level descriptions, it is represented by entanglement between components of the wavefunction. (The concept of mutual information then meshes nicely with the known observations of thermodynamics in allowing “entropy” to be precisely defined in information-theoretic terms.)
If I understand you correctly, you’re saying, start from self-observation, but permit the self to be ontologically basic, then re-interpret mathematical physics so that it doesn’t deny your conscious experience.
Apart from permitting the self to be ontologically basic, I don’t see how this differs from Yudkowsky’s approach. You seem to be under the false impression that he wants to somehow deny the reality of consciousness. But he doesn’t—instead, he says to ask questions of the form, “why do I believe I’m conscious, or have the feeling of consciousness?” and then search through the ways that mathematical physics would allow it to be generated, at which point it starts to match your approach.
So what do you gain from positing the ontologically-basic self? It’s not progress, because you’re still left with the problem of why (you can reasonably infer) there is consciousness in all of these other beings you observe. Why is it so correlated with a kind of biological form, one that goes from not-conscious to conscious? How does this self thing work in the forms that we see it working in? But once you know the answer to those questions, what purpose does the additional ontological supposition serve?
Incidentally, have you actually exhausted all the ways to account for your consciousness? Read the free will) series? Tried to represent consciousness using the entire information theoretic toolbox (mutual information, conditional independence, entropy maximization, etc.)?
I did say: “quantum mechanics, when expressed in a form which makes “measurement” an undefined basic concept”, which is how it is often expressed. And maybe you’ll even agree with me that in that form, it is definitely incomplete.
I escape the conscious sorites paradox without vaguing out.
As for how such an entity relates to everything else, that’s the point of exploring a monadological interpretation of the physics (and hence the biology) that we already have. But I don’t mind if you want evidence of functionally relevant quantum coherence in the brain before you take it seriously.
Reverting to the case of color, the information-theoretic analysis brings us no closer to getting actual shades of color existing anywhere in a universe of electrons and quarks. It just talks about correlations and dependencies among the states and behaviors of colorless aggregates of colorless particles. Since actual (phenomenal) color is causally relevant—we see it, we talk about it—you can perform an info-theoretic analysis of its causes, correlates, and effects too. But just having an ontology with a similar causal structure is not going to give you the thing itself. I endorse 100% the causal analysis of qualia, as a pathway to knowledge, but not their causal reduction.
Well, why were you directing that remark at an audience that doesn’t leave measurement as an undefined basic concept, and implying that the audience falls prey to this error?
You crucially depend on the remaining vagueness of the exact moment when “a shade of blue” arises, and you fail to produce any more specificity than in your latest post.
Looks to me like you didn’t escape either.
The problem is more fundamental than that—I would need to know why I get to the physics-based explanation faster by making your ontological assumptions, not just the fact that you could fit it into your ontology.
Just like an information theoretic analysis of a program brings us no closer to getting actual labels for the program’s referents.
What I actually said (check the original sentence!) was that this audience recognizes the error and advocates many-worlds as the answer.
This apparently runs several things together. Unfortunately I see no way to respond without going into tedious detail.
You asked: why say the self is a single object? I answered: so I don’t have to regard its existence as a vague (un-objective) matter. Bear in mind that the motivating issue here is not the existence of color, but the objectivity of the existence of the self. If selves are identified with physical aggregates whose spatial boundaries are somewhat arbitrary, then the very existence of a self becomes a matter of definition rather than a matter of fact.
In your remark quoted above, you seem to be thinking of two things at once. First, I asked Psychohistorian when it is that color comes into being, if it is indeed implicitly there in the physics we have. Second, I have not offered an exact account of when, where and how subjective color exists in the conscious monad and how this relates to the quantum formalism. Finally comes the concluding criticism that I am therefore tolerating vagueness in my own framework, in a way that I don’t tolerate in others.
There are maybe three things going on here. In the original discussions surrounding the Sorites paradox (and Robin Hanson’s mangled worlds), it was proposed that there is no need to have a fully objective and non-arbitrary concept of self (or of world). This makes vagueness into a principle: it’s not just that the concept is underdetermined, it’s asserted that there is no need to make it fully exact.
The discussion with Psychohistorian proceeds in a different direction. Psychohistorian hasn’t taken a stand in favor of vagueness. I was able to ask my question because no-one has an exact answer, Psychohistorian included, but Psychohistorian at least didn’t say “we don’t need an exact answer”—and so didn’t “vague out”.
For the same reason, I’m not vaguing out just because I don’t yet have an exact theory of my own about color. I say it’s there, it’s going to be somewhere in the monad, and that it is nowhere in physics as conventionally understood, not even in a stimulus-classifying brain.
I hate it when discussions bog down in this sort of forensic re-analysis of what everyone was saying, so I hope you can pick out the parts which matter.
The ontological assumptions are made primarily so I don’t have to disbelieve in the existence of time, color, or myself. They’re not made so as to expedite biophysical progress, though they might do so if they’re on the right track.
Colors are phenomena, not labels. It’s the names of colors which are labels, for contingent collections of individual shades of color. There is no such thing as objective “redness” per se, but there are individual shades of color which may or may not classify as red. It’s all the instances of color which are the ontological problem; the way we group them is not the problem.
Fair point about missing the context on my part, and I should have done better, since I rip on others when they do the same—just ask Z M Davis!
Still, if this is what’s going on here—if you think rejection of your ontology forces you into one of two unpalatable positions, one represented by Robin_Hanson, and the other by Psychohistorian—then this rock-and-a-hard-place problem of identity should have been in your main post to show what the problem is, and I can’t infer that issue from reading it.
Again, nothing in the standard LW handling requires you to disbelieve in any of those things, at the subjective level; it’s just that they are claimed to arise from more fundamental phenomena.
Then I’m lost: normally, the reason to propose e.g. a completely new ontology is to eliminate a confusion from the beginning, thereby enhancing your ability to achieve useful insights. But you’re position is: buy into my ontology, even though it’s completely independent of your ability to find out how consciousness works. That’s even worse than a fake explanation!
I think you’re misunderstanding the Drescher analogy I described. The gensyms don’t map to our terms for color, or classifications for color; they map to our phenomenal experience of color. That is, the distinctiveness of experiencing red, as differentiated from other aspects of your consciousness, is like the distinctiveness of several generated symbols within a program.
The program is able to distinguish between gensyms, but the comparison of their labels across different program instances is not meaningful. If that’s not a problem in need of a solution, neither should qualia be, since qualia can be viewed as the phenomenon of being able to distinguish between different data structures, as seen from the inside.
(To put it another way, your experience of color has be different enough so that you don’t treat color data as sound data.)
I emphasize that Drescher has not “closed the book” on the issue; there’s still work to be done. But you can see how qualia can be approached within the reductionist ontology espoused here.
In retrospect, I think this would have been a better order of exposition for the monadology article:
Start with a general discussion of the wavefunction of the brain, implicitly within a many-worlds framework, i.e. nothing about collapse. Most of us here will agree that this ought to be a conceptually valid enterprise, though irrelevant to biology because of decoherence.
Next, bring up the possibility of quantum effects being relevant to the brain’s information processing after all. In the absence of specific evidence, readers might be skeptical, but it’s still a logically valid concept, a what-if that doesn’t break any laws of physics.
Next, try to convey the modification to the quantum formalism that I described, whereby fundamental degrees of freedom (such as string-theoretic D0-branes) become entangled in island sets that are disentangled from everything else. (The usual situation is that everything is entangled with everything else, albeit to a vanishingly small degree.) There might be a few more frowns at this point, perplexed wondering as to where this is all leading, but it’s still just mathematics.
And finally say, these island sets are “monads”, and your subjective experience is the inner state of one big monad, and the actual qualities and relations which make up reality (at least in this case) are the ones which are subjectively manifest, rather than the abstractions we use for formal representation and calculation. This is the part that sounds like “woo”, where all these strange ologies like ontology, phenomenology, and monadology, show up, and it’s the part which is getting the strongest negative reaction.
In privately developing these ideas I started with the phenomenology, and worked backwards from there towards the science we have, but an exposition which started with the science we have and incrementally modified it towards the phenomenology might at least have made 3/4ths of the framework sound comprehensible (the three steps where it’s all still just mathematics).
Then again, by starting in the middle and emphasizing the ontological issues, at least everyone got an early taste of the poison pill beneath the mathematical chocolate. Which might be regarded as a better outcome, if you want to keep woo out of your system at all costs.
I don’t see why you need to bring in quantum mechanics or D0-branes here. What you’re describing is just a standard case of a low-entropy, far-from-equilibrium island, also known as a dissipative system, which includes Benard cells, control systems, hurricanes, and indeed life itself.
They work by using a fuel source (negentropy) to create internal order, far from equilibrium with their environment, and have to continually export enough entropy (disorder) to make up for that which they create inside. While the stabilizing/controlling aspect of them will necessarily be correlated with the external “disturbances”, some part of it will be screened off from the environment, and therefore disentangled.
In fact, the angle I’ve been working is to see if I can come up with a model whereby consciousness arises wherever natural, causal forces align to allow local screening off from the environment (which we know life does anyway) plus a number of other conditions I’m still figuring out.
We could of followed it better that way but there is still no reasoning in those steps. The right way to structure this discussion is to start with the problems you want to solve (and why they’re problems) and then explain how to solve them. This outline still has nothing motivating it. What people need to see is what you think needs explaining and how you theory explains it best. Its true that the math might be acceptable enough but no one really wants to spend their time doing math to solve problems they don’t think are problems or for explaining the behavior of things they don’t think exist.
It is as if someone showed up with all this great math that they said described God. Math is fun and all but we’d want to know why you think there is a God that these equations describe!
Yes, as a strategy to convince your readers, the new ordering would likely be more effective. However, re-ordering or re-phrasing your reasoning in order to be more rhetorically effective is not good truth-seeking behavior. Yudkowsky’s “fourth virtue of evenness” seems relevant here.
The counterargument that Mitchell Porter’s critics bring is recognizable as “Argument From Bias”. Douglas Walton describes it in this way:
Major Premise: If x is biased, then x is less likely to have taken the evidence on both sides into account in arriving at conclusion A
Minor Premise: Arguer a is biased.
Conclusion: Arguer a is less likely to have taken the evidence on both sides into account in arriving at conclusion A.
In this case, the bias in question is Mitchell Porter’s commitment to ontologically basic mental entities (monads or similar). We must discount his conclusions regarding the deep reading of physics that he has apparently done, because a biased individual might cherry-pick results from physics that lead to the preferred conclusion.
This discounting is only partial, of course—there is a chance of cherry-picking, not a certainty.
Imagine that reptiles developed the first AI. Imagine that they successfully designed it to maximize reptilian utility functions, faithfully, forever and ever, throughout the universe.
Now imagine that humans develop the first AI. Imagine that they successfully design it to maximize human utility functions, faithfully, forever and ever, throughout the universe.
In the long view, these two scenarios are nearly indistinguishable. The difference between them is smaller than the difference between bacteria and protozoa seems to us.
The human-centric sentiment in this post, which I’ve heard from many others thinking about the Singularity, reminds me why I sometimes think we would be safer rushing pell-mell into the Singularity, than stopping to think about it. The production of untamed AI could lead to many horrible scenarios; but if you want to be sure to screw things up, have a human think hard about it. (To be doubly sure, take a popular vote on it.)
Your initial reaction is probably that this is ridiculous; but that’s because you’re probably thinking of humans designing simple things, like cars or digital watches. Empirically, however, with large complex systems such as ecosystems and economies, humans have usually made things worse when they thought about the problem really hard and tried to make it better—especially when they made decisions through a political process. Religion, communism, monoculture crops, rent control, agricultural subsidies, foreign aid, the Biodome—I could go on. We humans only attain the ability to perform as-good-as-random when intervening in complex systems after centuries of painful experience. And we get only one shot at the Singularity.
(The complex system I speak of is not the AI, but the evolution of phenomena such as qualia, emotions, consciousness, and values.)
I don’t believe there is a species-wide utility function at all. Different humans have completely unrelated, often opposed, utility functions. Humans can be raised to desire pretty much anything at all if you completely control their upbringing and brains. The idea of “ideal morals” is completely unfounded.
The only thing I can aspire to, in building a world-conquering superintelligence, is enforcing my own values on everyone else.
I agree, including the emphasis on only. It so happens that my own values include considerations of the values held by other humans but this ‘meta function’ is just mine and the values of some other humans are directly opposed to this.
This is reasonable—but what is odd to me is the world-conquering part. The justifications that I’ve seen for creating a singleton soon (e.g. either we have a singleton or we have unfriendly superintelligence) seem insufficient.
How certain are you that there is no third alternative? Suppose that you created an entity which is superhuman in some respects (a task that has already been done already done many times over) and asked it to find third alternatives. Wouldn’t this be a safer, saner, more moral and more feasible task than conquering the world and installing a singleton?
Note that “entity” isn’t necessarily a pure software agent—it could be a computer/human team, or even an organization consisting only of humans interacting in particular ways—both of these kinds of entity already exist, and are more capable than humans in some respects.
The purpose of of installing a singleton is to prevent anyone, anywhere, ever, doing something I disapprove of. (I can give the usual examples of massive simulated torture, but a truly superhuman intelligence could be much more inventively and unexpectedly unpleasant than anything we’re likely to imagine.) Even if an unfriendly superintelligence isn’t a certainty (and there are arguments that it is) - why take the huge risk?
Now, can there be anything other than a singleton which would give me comparable certainty? I would need to predict what all the rest of the universe will do, and to be able to stop anything I didn’t like, and to predict things sufficiently in advance to stop them in time. (As a minimum requirement, if a place is X light seconds away, I need to predict based on X-second-old information, and it takes another X seconds to intervene.)
This includes stopping anyone from gaining any form of power that might possibly defy me in the future. And it must be effective for the whole universe, even if Earth-descended technology of some sort spreads out at nearly the speed of light, because I don’t know what might come back at me from out there.
Suppose there was another way to accomplish all this that wasn’t an outright singleton, i.e. didn’t rewrite the effective laws or physics or replaced the whole universe with a controlled simulation. What possible advantage could it have over a singleton?
This sounds to me like an irresistible force/immovable object problem—two people who are focused on different (large or intense) aspects of a problem disagree—but the real solution is to reframe the problem as a balance of considerations.
As I understand it, on the one hand, there are the arguments (e.g. Eliezer Yudkowksky’s document “Creating Friendly AI”) that technological progress is mostly not stoppable and enthusiasts and tinkerers are accidentally going to build recursively self-improving entities that probably do not share your values. On the other hand, striving to conquer the world and impose one’s values by force is (I hope we agree) a reprehensible thing to do.
If, for example, there was a natural law that all superintelligences must necessarily have essentially the same ethical system, then that would tip the balance against striving to conquer the world. In this hypothetical world, enthusiasts and tinkerers may succeed, but they wouldn’t do any harm. John C. Wright posits this in his Golden Transcendence books and EY thought this was the case earlier in his life.
If there was a natural law that there’s some sort of upper bound on the rate of recursive self-improvement, and the world as a whole (and the world economy in particular) is already at the maximum rate, then that would also tip the balance against striving to conquer the world. In this hypothetical world, the world as a whole will continue to be more powerful than the tinkerers, the enthusiasts, and you. Robin Hanson might believe some variant of this scenario.
Not at all. It’s the only truly valuable thing to do. If I thought I had even a tiny chance of succeeding, or if I had any concrete plan, I would definitely try to build a singleton that would conquer the world.
I hope that the values I would impose in such a case are sufficiently similar to yours, and to (almost) every other human’s, that the disadvantage of being ruled by someone else would be balanced for you by the safety from ever being ruled by someone you really wouldn’t like.
A significant part of the past discussion here and in other singularity-related forums has been about verifying that our values are in fact compatible in this way. This is a necessary condition for community efforts.
But there isn’t, so why bring it up? Unless you have a reason to think some other condition holds that changes the balance in some way. Saying some condition might hold isn’t enough. And if some such condition does hold, we’ll encounter it anyway while trying to conquer the world, so no harm done :-)
I’m quite certain that we’re nowhere near such a hypothetical limit. Even if we are, this limit would have to be more or less exponential, and exponential curves with the right coefficients have a way of fooming that tends to surprise people. Where does Robin talk about this?
Not so much. Multiple FAIs of different values (cooperating in one world) are equivalent to one FAI of amalgamated values, so a community effort can be predicated on everyone getting their share (and, of course, that includes altruistic aspects of each person’s preference). See also Bayesians vs. Barbarians for an idea of when it would make sense to do something CEV-ish without an explicitly enforced contract.
You describe one form of compatibility.
How so? I don’t place restrictions on values, more than what’s obvious in normal human interaction.
No it’s not. We are talking about “my values”, and so if I believe it’s improper to impose them using procedure X, then part of “my values” is that procedure X shouldn’t have been performed, and so using procedure X to impose my values is unacceptable (not a valid subgoal of “imposing my values”). Whatever means are used to “impose my values” must be good according to my values. Thus, not implementing the dark aspects of “conquering the world”, such as “by force”, is part of “conquering the world” as instrumental action for achieving one’s goals. You create a singleton that chooses to be nice to the conquered.
There is also a perhaps much more important aspect of protecting from mistakes: even if I was the only person in the world, and not in immediate danger from anything, it still would make sense to create a “singleton” that governs my own actions. Thus the intuition for CEV, where you specify an everyone’s singleton, not particularly preferring given people.
Possibly you’re using technical jargon here. When non-LessWrong-reading humans talk about one person imposing their values on everyone else, they would generally consider it immoral. Are we in agreement here?
Now, I could understand your starement (“No it’s not”) in either of two ways: Either you believe they’re mistaken about whether the action is immoral, or you are using a different (technical jargon) sense of the words involved. Which is it?
My guess is that you’re using a technical sense of “values”, which includes something like the various clauses enumerated in EY’s description of CEV: “volition is what we wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together, …”.
If by “values” you include those things that you don’t think you value now but you would value if you had more knowledge of them, or would be persuaded to value by a peer if you hadn’t conquered the world and therefore eliminated all of your peers, then perhaps I can see what you’re trying to say.
By talking about “imposing your own values” without all of the additional extrapolated volition clauses, you’re committing an error of moral overconfidence—something which has caused vast amounts of unpleasantness throughout human history.
http://www.overcomingbias.com/2009/01/moral-uncertainty-towards-a-solution.html
Not at all. The morality of imposing my values on you depends entirely on what you were doing, or were going to do, before I forced you to behave nicely.
You may have misread that, and answered a different question, something like “Is it moral?”. The quote actually is asking “Do non-LessWrong-reading humans generally consider it moral?”.
I answered the right quote.
Random examples: was the U.S. acting morally when it entered WW2 against the Nazis, and imposed their values across Western Europe and in Japan? Is the average government acting morally when it forcefully collects taxes, enforcing its wealth-redistribution values? Or when it enforces most kinds of laws?
I think most people by far (to answer your question about non-LW-readers) support some value-imposing policies. Very few people are really pure personal-liberty non-interventionists. The morality of the act depends on the behavior being imposed, and on the default behavior that exists without such imposition.
It remains to stipulate that the government has a single person at its head who imposes his or her values on everyone else. Some governments do run this way, some others approximate it.
Edit: What you may have meant to say, is that the average non-LW-reading person, when hearing the phrase “one human imposing their values on everyone else”, will imagine some very evil and undesirable values, and conclude that the action is immoral. I agree with that—it’s all a matter of framing.
Of course, I’m talking about values as they should be, with moral mistakes filtered out, not as humans realistically enact them, especially when the situation creates systematic distortions, as is the case with granting absolute power.
Posts referring to necessary background for this discussion:
Ends Don’t Justify Means (Among Humans)
Not Taking Over the World
As far as software is concerned, this flavor of superhumanity does not remotely resemble anything that has already been done. You’re talking about assembling an “entity” capable of answering complex questions at the intersection of physics, philosophy, and human psychology. This is a far cry from the automation of relatively simple, isolated tasks like playing chess or decoding speech—I seriously doubt that any sub-AGI would be up to the task.
The non-software alternatives you mention are even less predictable/controllable than AI, so I don’t see how pursuing those strategies could be any safer than a strictly FAI-based approach. Granted sufficient superhumanity (we can’t precisely anticipate how much we’re granting them), the human components of your “team” would face an enormous temptation to use their power to acquire more power. This risk would need to be weighed against its benefits, but the original aim was just the prevention of a sub-optimal singleton! So all we’ve done is step closer to the endgame without knowably improving our board position.
Human teams, or human/software amalgams (like the LessWrong moderation system that we’re part of right now) are routinely superhuman in many ways. LessWrong, considered as a single entity, has superhumanly broad knowledge. It has a fairly short turn-around time for getting somewhat thoughtful answers—possibly more consistently short turn-around time than any of us could manage alone.
An entity such as this one might be highly capable in some narrowly focused ways (indeed, it could be purpose-built for one goal—the goal of reducing the risk of the earth being paperclipped) while being utterly incapable in many other ways, and posing almost no threat to the earth or wider society.
Building a purportedly-Friendly general-purpose recursive self-improving process, on the other hand, means a risk that you’ve created something that will diverge on the Nth self-improvement cycle and become unfriendly. By explicitly going for general-purpose, no-human-dependencies, and indefinitely self-improvable, you’re building in exactly the same elements that you suspect are dangerous.
This is a fairly obvious point that becomes more complicated in a larger scope. Being charitable, you seem to be implying
where FAI-attempt means “we built and deployed an AGI that we thought was Friendly”. If our FAI efforts are the only thing that causally affects Fail, then your implication might be correct. But if we take into account all other AGI research, we need more detail. Assume FAI researchers have no inherent advantage over AGI researchers (questionable). Then we basically have
So in these terms, what would it mean for an FAI attempt to be riskier?
Eliezer has argued at considerable length that P(Fail | AGI & ~FAI) is very close to 1. So under these assumptions, the odds of a FAI failure must be higher than the odds of non-FAI AGI being created in order to successfully argue that FAI is more dangerous than the alternative. Do you have any objection to my assumptions and derivation, or do you believe that P(Fail | FAI) > P(AGI | ~FAI)?
Can you send me a model? I think my objection is to the binariness of the possible strategies node, but I’m not sure how to express that best in your model.
Suppose there are N rojects in the world each of which might almost-succeed and so each of which is an existential risk.
The variable that I can counterfactually control is my actions. The variable that we can counterfactually control are our actions. Since we’re conversing in persuasive dialog, it is reasonable to discuss what strategies we might take to best reduce existential risk.
Suppose that we distinguish between “safety strategies” and “singleton strategies”.
Singleton strategies are explicitly going for fast, general-purpose power and capability, with as many stacks of iterated exponential growth in capability as the recursive self-improvement engineers can manage. It seems obvious to me that if we embarked on a singleton strategy, even with the best of intentions, there are now N+1 AGI projects, each increasing existential risk, and our best intentions might not outweigh that increase.
Safety strategies would involve attempting to create entities (e.g. human teams, human/software amalgams, special-purpose software) which are explicitly limited and very unlikely to be generally powerful compared to the world at large. They would try to decrease existential risk both directly (e.g. build tools for the AGI projects that reduce the chance of the AGI projects going wrong) and indirectly, by not contributing to the problem.
No, sorry, the above comment was just my attempt to explain my objection as unambiguously as possible.
Yes, but your “N+1” hides some important detail: Our effective contribution to existential risk diminishes as N grows, while our contribution to safer outcomes stays constant or even grows (in the case that our work has a positive impact on someone else’s “winning” project).
Since you were making the point that attempting to build Friendly AGI contributes to existential risk, I thought it fair to factor out other actions. The two strategies you outline above are entirely independent, so they should be evaluated separately. I read you as promoting the latter strategy independently when you say:
The choice under consideration is binary: Attempt a singleton or don’t. Safety strategies may also be worthwhile, but I need a better reason than “they’re working toward the same goal” to view them as relevant to the singleton question.
If a superintelligence is able to find a way to reliably prevent either the emergence of a rival, preventable existential risk or actions sufficiently undesirable then by all means it can do that instead.
By the way, the critical distinction is that with AGI, you are automating the whole decision-making cycle, while other kinds of tools only improve on some portion of the cycle under human control or anyway with humans somewhere in the algorithm.
If what you say about consciousness is true, and not just a bunch of baloney, what are the actual implications of that? Does your theory make any testable predictions? What are the practical consequences for the building of an AI, or indeed for anything?
You have to make sure your AI’s utility function isn’t susceptible to Jedi mind tricks.
Not being constrained by that mundane mathematical physics stuff also opens up all sorts of new opportunities for a path to superintelligence.
This seems to be an extraordinary claim, but it doesn’t seem anyone else has commented on this. Admittedly this is somewhat tangential to the main point, but I wonder if this is the common educated view on the subject.
My own general expectation was that self-replicating nanotechnology would take several decades at the low end and possibly more than a hundred years, in the absence of an AI singularity.
Um… last I checked, self-replicating nanotechnology had been here for quite a while. It’s called bacteria. Although they haven’t managed to kill the planet so far.
They came close.
Under the same definition, all life—including us humans—is based on self-replicating nanotechnology. Bacteria are nothing special.
I was thinking of “diamondoid mechanosynthesis”. I don’t see twenty more years being required to make freeliving nanomechanical replicators based on DMS.
You don’t? And what about the experts? (Or nobody is confident to stick their neck out long enough to make a prediction?)
The topic is somewhat controversial in that DMS and its products so far only exist in simulation. The only “experts” are people like Freitas, Merkle, and Drexler who have devoted years of their lives to design and modeling of DMS. So the very first issue, if you’re an outsider, is whether the DMS research community’s expertise pertains to anything real. The criticism is less prominent these days, there are a few more chemists from outside that core group taking an interest, and perhaps the zeitgeist inside the broader discipline of chemistry is friendlier to the concept. But it remains controversial.
Back in 1994, Drexler said we were ten to twenty years away from advanced nanotechnology. I’m not aware of these principal researchers making comparable predictions lately. But I suspect that, given their somewhat marginal position, they prefer to limit their claims to matters which in principle are computationally and physically verifiable. Though in this interview Freitas does agree with the interviewer that the design of an “ecophage … should be rather obvious to anyone who is “skilled in the art”.” I’m not a DMS expert, but I’ve studied the literature, and I agree that at least one type of replicator (the one I’ve thought the most about) looks dangerously easy to make (if you can call a decade or two of R&D “easy”).
It’s a rephrase by Steven C. Vetter of what he understood Drexler said, requoted by John K Clark partly out of context, so it was probably a prediction of some other event instead. But yeah, the idea is probably stretched too thin to seek a consensus opinion.
I have substituted a link to Vetter’s original communication.
It’s all well and good to say that we (neuroscientists) should endevour to understand consciousness better before proceeding to the creation of an AI, but how realistic do you really believe this approach to be? Mike Vassar pointed out that once we have advanced enough technology to emulate a significant number of neurons in a brain circuit, we’d also have good enough technology to create an AGI. If you’re arguing for some kind of ‘quantum tensor factor,’ and need quantum level emulations of the brain for consciousness, AGI’s will have been generated long before we’ve even put a dent in identifying the ineffable essence of consciousness. This is not to say you are wrong, just that what you ask is impossible.
I am more optimistic than you are about science achieving the sort of progress I called for, in time to be relevant. For one thing, this is largely about basic concepts. One or two reconceptualizations, as big as what relativity did to time, and the topic of consciousness will look completely different. That conceptual progress is a matter of someone achieving the right insights and communicating them. (Evidently my little essay on quantum monadology isn’t the turning point, or my LW karma wouldn’t be sinking the way it is...) The interactions between progress in neuroscience, classical computing, quantum computing, and programming are complicated, but when it comes to solving the ontological problem of consciousness in time for the Singularity, I’d say the main factor is the rate of conceptual progress regarding consciousness, and that’s internal to the field of consciousness studies.
More engaging, less whining, please.
Hey, it was a joke.
To help readers distinguish between self-deprecating jokes and whining, the Internet has provided us with a palette of emoticons. I recommend ”;-)” for this particular scenario.
Well, it may be that once we actually know that much more about the brain’s wiring, well, that additional knowledge may be enough to help untangle some of the mysteriousness of consciousness.
One way of figuring out what the brain is, is to ask what purposes the brain fulfils for evolution. Apart from high speed control (low speed control can be done by other means than neurons), you can also view the brain as an enabler for speeding evolution, by adapting to body changes. Imagine if evolution had to wait for a mutation in the sensing/control system as well as a change in body plan before it make a body modification. Eyes would have a hard time coming into existence as there would be nothing to process the information they provided in a useful fashion.
So you can see the brain as a system to figure out what body it is in, and what it should be doing and how it should be processing information based on feedback of usefulness of its actions (which don’t change much in evolutionary time, such as food, sex, status and children*). Evolutionary usefulness being indicated by pleasure and other such signals. Yet we are not constrained to try and maximize these signals. Just that behaviours and goal seeking parts of the mind that happen to do things that produce these signals get reinforced. We can choose to avoid getting usefulness signals, as we do when we chose not to take addictive drugs.
I’m not sure if you get any closer to figuring out the consciousness problem. But it might be a start.
The evolutionary nature of the brain is shown perhaps most by how we find baby mammals cute and we wish to take care of them. The part of the brain that deals with kids doesn’t know what type of body it is going to end up in so is indiscriminate in making young-looking mammals cute.
Many species prefer to eat things that look and taste like what their parents fed them, regardless of what that is. When a group of these creatures lands in a new environment, they’ll choose foods randomly, those that chose well will succeed, and evolve a healthy diet in just a few generations, without genetics being involved at all.
While the idea of evolving the ability to evolve faster might be made to work, it needs to be spelled out carefully, lest it attribute foresight to evolution.
Ordinarily you have trait X and you say it increases fitness and goes to fixation in a population, but it’s less obvious how this works with the trait of evolving faster… which is not to say that such a thing is impossible. But you might need to invoke differing long-term survival of large groups of species, or something...
Nerve cells most likely evolved for a different purpose, high speed communication. Adaptivity of this network improves fitness because even when you are in one body you don’t know how big it is (it grows), so you need to send different signals for different size bodies. Also if you can link in extant sensors used for chemotaxic or phototaxic behaviour and use this information in the high speed network without having to re-evolve the behaviours, then you can gain fitness advantages.
I’m saying it smooths out the curves of the search space that evolution is moving in. Rather than having discontinuous jumps between the fitness of (lack of eye, no information processing for eye) and (light sensor, genetic adaption for processing the information from the light sensor), you get the step of (light sensor, some system that can do something with the information) in between. Getting both together is unlikely.
In this respect it plays a similar role to hox genes. Getting symmetrical legs for locomotion (or wings for flight) is unlikely unless you have a modular system.
Yes, this kind of selection for general learning ability is known as the Baldwin effect.
True it fits the definition, as long as you allow “change in the environment” to be change in a different gene.
Sex seems to fit the bill here. Clades which reproduce sexually are able to evolve more rapidly in response to changing environments, and the trait of sexual reproduction becomes established in the biota.
This might explain the maintainance of the trait better than how it came to arise in the first place… but maybe that’s good enough.
I would recommend the 2nd law entropy maximization approach, which attempts to account for instances of increased genetic success by their increase in the rate of entropy generation.
Looking through the Citizendium Life article footnotes, I found this article (footnote 28) that I’m reading now, which does so. It also explores the thermodynamic role that perception plays (in mammals and other organisms), which looks to be a promising piece of the puzzle of consciousness. The article sounds flaky at first, but it’s clearly looking for (what we would call) a reductionist explanation of the purposefulness of complex organisms.
Very confusing. Although if I squint I can sorta see the shades of my own current “procedural” subjectivist stance and worries about the lack of good conceptual foundation for preference, I’m pretty sure the similarity is all smoke and mirrors.
“The truly fast way to produce a human-relative ideal moral agent is to create an AI with the interim goal of inferring the “human utility function” (but with a few safeguards built in, so it doesn’t, e.g., kill off humanity while it solves that sub-problem),”
That is three-laws-of-robotics-ism, and it won’t work. There’s no such thing as a safe superintelligince that doesn’t already share our values.
Surely there can be such super-intelligences: Imagine a (perhaps autistic) IQ-200 guy who just wants to stay in his room and play with his paperclips. He doesn’t really care about the rest of the world, he doesn’t care about extending his intelligence further, and the rest of the world doesn’t quite care about his paperclips. Now replace the guy with an AI with the same values: it’s quite super-intelligent already, but it’s still safe (in the sense that objectively it poses no threat, other than the fact that the resources it uses playing with its paperclips could be used for something else); I have no problem scaling its intelligence much further and leaving it just as benign.
Of course, once it’s super-intelligent (quite a bit earlier, in fact), it may be very hard or impossible for us to determine that it’s safe — but then again, the same is true for humans, and quite a few of the billions of existing and past humans are or have been very dangerous.
The difference between “X can’t be safe” and “X can’t be determined to be safe” is important; the first means “probability we live, given X, is zero”, and the other means “probability we live, given X, is strictly less than one”.
If you want to integrate the phenomenal into your ontology, is there any reason you’ve stopped short of phenomenalism ?
EDIT: Not sarcasm—quite serious.
(Phenomenalism defined.)
Phenomenalism (whether solipsistic or multi-person) doesn’t explain where phenomena come from or why they have their specific forms. If you can form a causal model of how the thing which experiences appearances is induced to have those experiences, you may as well do so. From an ontological perspective, you could say it’s phenomenalism which stops short of providing an explanation.
What worries me about this tact is that I’m sufficiently clever to realize that in conducting a vast and complex research program to empirically test humanity to determine a global reflectively consistent utility function, I will be changing the utility trade-offs of humanity.
So I might as well make sure that I conduct my mass studies in such a way to ensure that the outcome is both correct and easier for me to perform my second much longer (essentially infinitely longer) time phase of my functioning.
So said AI would determine and then forever follow exactly what humanity’s hidden utility function is. But there is no guarantee that this is a particularly friendly scenario.
Immovable scepticism here. Mathematical physics may be too low level to be directly useful in all instances but it is rather useful at describing things without appealing to magic.
Fundamental disagreement or no, it would make an interesting stand alone article. Is it really an important part of this one? I find it distracts me significantly from whatever actual minority view you are trying to express.