David Chalmers’ “The Singularity: A Philosophical Analysis”
David Chalmers is a leading philosopher of mind, and the first to publish a major philosophy journal article on the singularity:
Chalmers, D. (2010). “The Singularity: A Philosophical Analysis.” Journal of Consciousness Studies 17:7-65.
Chalmers’ article is a “survey” article in that it doesn’t cover any arguments in depth, but quickly surveys a large number of positions and arguments in order to give the reader a “lay of the land.” (Compare to Philosophy Compass, an entire journal of philosophy survey articles.) Because of this, Chalmers’ paper is a remarkably broad and clear introduction to the singularity.
Singularitarian authors will also be pleased that they can now cite a peer-reviewed article by a leading philosopher of mind who takes the singularity seriously.
Below is a CliffsNotes of the paper for those who don’t have time to read all 58 pages of it.
The Singularity: Is It Likely?
Chalmers focuses on the “intelligence explosion” kind of singularity, and his first project is to formalize and defend I.J. Good’s 1965 argument. Defining AI as being “of human level intelligence,” AI+ as AI “of greater than human level” and AI++ as “AI of far greater than human level” (superintelligence), Chalmers updates Good’s argument to the following:
There will be AI (before long, absent defeaters).
If there is AI, there will be AI+ (soon after, absent defeaters).
If there is AI+, there will be AI++ (soon after, absent defeaters).
Therefore, there will be AI++ (before too long, absent defeaters).
By “defeaters,” Chalmers means global catastrophes like nuclear war or a major asteroid impact. One way to satisfy premise (1) is to achieve AI through brain emulation (Sandberg & Bostrom, 2008). Against this suggestion, Lucas (1961), Dreyfus (1972), and Penrose (1994) argue that human cognition is not the sort of thing that could be emulated. Chalmers (1995; 1996, chapter 9) has responded to these criticisms at length. Briefly, Chalmers notes that even if the brain is not a rule-following algorithmic symbol system, we can still emulate it if it is mechanical. (Some say the brain is not mechanical, but Chalmers dismisses this as being discordant with the evidence.)
Searle (1980) and Block (1981) argue instead that even if we can emulate the human brain, it doesn’t follow that the emulation is intelligent or has a mind. Chalmers says we can set these concerns aside by stipulating that when discussing the singularity, AI need only be measured in terms of behavior. The conclusion that there will be AI++ at least in this sense would still be massively important.
Another consideration in favor of premise (1) is that evolution produced human-level intelligence, so we should be able to build it, too. Perhaps we will even achieve human-level AI by evolving a population of dumber AIs through variation and selection in virtual worlds. We might also achieve human-level AI by direct programming or, more likely, systems of machine learning.
Premise (2) is plausible because AI will probably be produced by an extendible method, and so extending that method will yield AI+. Brain emulation might turn out not to be extendible, but the other methods are. Even if human-level AI is first created by a non-extendible method, this method itself would soon lead to an extendible method, and in turn enable AI+. AI+ could also be achieved by direct brain enhancement.
Premise (3) is the amplification argument from Good: an AI+ would be better than we are at designing intelligent machines, and could thus improve its own intelligence. Having done that, it would be even better at improving its intelligence. And so on, in a rapid explosion of intelligence.
In section 3 of his paper, Chalmers argues that there could be an intelligence explosion without there being such a thing as “general intelligence” that could be measured, but I won’t cover that here.
In section 4, Chalmers lists several possible obstacles to the singularity.
Constraining AI
Next, Chalmers considers how we might design an AI+ that helps to create a desirable future and not a horrifying one. If we achieve AI+ by extending the method of human brain emulation, the AI+ will at least begin with something like our values. Directly programming friendly values into an AI+ (Yudkowsky, 2004) might also be feasible, though an AI+ arrived at by evolutionary algorithms is worrying.
Most of this assumes that values are independent of intelligence, as Hume argued. But if Hume was wrong and Kant was right, then we will be less able to constrain the values of a superintelligent machine, but the more rational the machine is, the better values it will have.
Another way to constrain an AI is not internal but external. For example, we could lock it in a virtual world from which it could not escape, and in this way create a leakproof singularity. But there is a problem. For the AI to be of use to us, some information must leak out of the virtual world for us to observe it. But then, the singularity is not leakproof. And if the AI can communicate us, it could reverse-engineer human psychology from within its virtual world and persuade us to let it out of its box—into the internet, for example.
Our Place in a Post-Singularity World
Chalmers says there are four options for us in a post-singularity world: extinction, isolation, inferiority, and integration.
The first option is undesirable. The second option would keep us isolated from the AI, a kind of technological isolationism in which one world is blind to progress in the other. The third option may be infeasible because an AI++ would operate so much faster than us that inferiority is only a blink of time on the way to extinction.
For the fourth option to work, we would need to become superintelligent machines ourselves. One path to this mind be mind uploading, which comes in several varieties and has implications for our notions of consciousness and personal identity that Chalmers discusses but I will not. (Short story: Chalmers prefers gradual uploading, and considers it a form of survival.)
Conclusion
Chalmers concludes:
Will there be a singularity? I think that it is certainly not out of the question, and that the main obstacles are likely to be obstacles of motivation rather than obstacles of capacity.
How should we negotiate the singularity? Very carefully, by building appropriate values into machines, and by building the first AI and AI+ systems in virtual worlds.
How can we integrate into a post-singularity world? By gradual uploading followed by enhancement if we are still around then, and by reconstructive uploading followed by enhancement if we are not.
References
Block (1981). “Psychologism and behaviorism.” Philosophical Review 90:5-43.
Chalmers (1995). “Minds, machines, and mathematics.” Psyche 2:11-20.
Chalmers (1996). The Conscious Mind. Oxford University Press.
Dreyfus (1972). What Computers Can’t Do. Harper & Row.
Lucas (1961). “Minds, machines, and Godel.” Philosophy 36:112-27.
Penrose (1994). Shadows of the Mind. Oxford University Press.
Sandberg & Bostrom (2008). “Whole brain emulation: A roadmap.” Technical report 2008-3, Future for Humanity Institute, Oxford University.
Searle (1980). “Minds, brains, and programs.” Behavioral and Brain Sciences 3:417-57.
Yudkowsky (2004). “Coherent Extrapolated Volition.”
- Philosophy: A Diseased Discipline by 28 Mar 2011 19:31 UTC; 150 points) (
- Singularity goes mainstream (in philosophy) by 21 Mar 2011 3:18 UTC; 43 points) (
- BOOK DRAFT: ‘Ethics and Superintelligence’ (part 1, revised) by 22 Feb 2011 20:59 UTC; 20 points) (
- BOOK DRAFT: ‘Ethics and Superintelligence’ (part 1) by 13 Feb 2011 10:09 UTC; 18 points) (
- BOOK DRAFT: ‘Ethics and Superintelligence’ (part 2) by 23 Feb 2011 5:58 UTC; 11 points) (
Isolation is trickier than it sounds. If AI is created once, then we can assume that humanity is an AI-creating species. What constraints on tech, action, and/or intelligence would be necessary to guarantee that no one makes an AI in what was supposed to be a safe-for-humans region?
Right. I’m often asked, “Why not just keep the AI in a box, with no internet connection and no motors with which to move itself?”
Eliezer’s experiments with AI-boxing suggest the AI would escape anyway, but there is a stronger reply.
If we’ve created a superintelligence and put it in a box, that means that others on the planet are just about capable of creating a superintelligence, too. What are you going to do? Ensure that every superintelligence everyone creates is properly boxed? I think not.
Before long, the USA or China or whoever is going to think that their superintelligence is properly constrained and loyal, and release it into the wild in an effort at world domination. You can’t just keep boxing AIs forever.
You can’t just keep boxing AIs forever?
Please—don’t reply to this comment—so I can delete it later.
“You just can’t keep AIs boxed forever”?
Chalmer’s talk at the Singularity Summit in 2009 presents similar content.
“we can still emulate it if it is mechanical.”
right, but how many more orders of magnitude of hardware do we need in this case? this depends on what level of abstraction is sufficient. isn’t it the case that if intelligence relies on the base level and has no useful higher level abstractions the amount of computation needed would be absurd (assuming the base level is computable at all)?
Probably a few less. This OB post explains how a good deal of the brain’s complexity might be mechanical work to increase signal robustness. Cooled supercomputers with failure rates of 1 in 10^20 (or whatever the actual rate is) won’t need to simulate the parts of the brain that error-correct or maintain operation during sneezes or bumps on the head.
good reference but I mean how much more we need if we are forced to simulate at say molecular level rather than simply as a set of signal processors.
Even emulating a single neuron at molecular level is so far beyond us.
Well, I don’t think we will ever be forced to simulate the brain at a molecular level. That possibility is beyond worst-case; as Chalmers says, it’s discordant with the evidence. The brain may not be a algorithmic rule-following signal processor (1), but an individual neuron is a fairly simple analog input/output device.
1: Though I think the evidence from neuroscience quite strongly suggests it is, and if all you’ve got against it is the feeling of being conscious then you honestly haven’t got a leg to stand on
I’m playing devil’s advocate in that I don’t think the brain will turn out to be anything more than a complex signal processor.
neurons do seem fairly simple, we don’t know what’s waiting for us when we try to algorithmically model the rest of the brain’s structure though.
Very true. It’s not going to be anywhere near as hard as the naysayers claim; but it’s definitely much harder than we’re capable of now.
I think this analysis assumes or emphasizes a false distinction between humans and “AI”. For example, Searle’s Room is an artificial intelligence built partly out of a human. It is easy to imagine intelligences built strictly out of humans, without paperwork. When humans behave like humans, we naturally form supervening entities (groups, tribes, memes).
I tried to rephrase Chalmers’ four-point argument without making a distinction between humans acting “naturally” (whatever that means) and “artificial intelligences”:
There is some degree of human intelligence and capabilities. In particular, human intelligence and capabilities has always involved manipulating the world indirectly (mediated by other humans or by nonhuman tools). “There is I”
Since intelligence and capabilities are currently helpful in modifying ourselves and our tools, as we apply our intelligence and capabilities to ourselves and our tools, we will grow in intelligence and capabilities. “If there is I, there will be I+”
If this self-applicability continues for many cycles, we will become very smart and capable. “If there is I+, there will be I++”.
Therefore, we will become very smart and very capable. “There will be I++.”
I’m not trying to dismiss the dangers involved in this process; all I’m saying is that the language used feeds a Skynet “us versus them” mentality that isn’t helpful. Admitting that “We have met the enemy and he is us.” focuses attention where it ought to be.
A lot of AI-risks dialogue is a blend of: foolish people focusing on Skynet scenarios, foolish rhetoric (whatever the author is thinking) alluding to Skynet scenarios, and straightforward sensible policies that could and should be separated from the bad science fiction.
This is what I mean by straightforward, sensible, non-sf policies: We have always made mistakes when using tools. Software tools allow us to make more mistakes faster, especially “unintended consequences” mistakes. We should put effort into developing more safety techniques guarding against unintended consequences of our software tools.
What mentality other than “us versus them” would be even remotely helpful for dealing with a UFAI?
We have met the enemy and we are paperclips.
“Us versus them” presupposes the existence of them, ie UFAI. Which means we have probably already lost. So really, no mentality would be remotely helpful for dealing with an existing UFAI.
Sci-fi policies can’t be good policies?
Are there any LW-rationalist-vetted philosophical papers on this theme in modern times? (I’m somewhat skeptical of the idea that there isn’t a universal morality (relative to some generalized Occamian prior-like-thing) that even a paperclip maximizer would converge to (if it was given the right decision theoretic (not necessarily moral per se) tools for philosophical reasoning, which is by no means guaranteed, so we should of course still be careful when designing AGIs).)
How would converging to a “universal morality” help produce paperclips?
If you’re a seed AI and you’re somewhat confused about what your creator meant when they said “valueOfCreatingPaperclips=infinity”, so you do what you think they were trying to get you to do, which was to create economic value by making paperclips, and the reason they wanted to do that was to make a profit for themselves, and the reason for that is they’re part of this larger system called humanity which is following this strange vector in preferencespace… Like I said, this doesn’t apply to AIs that are bad at that kind of philosophical reflection, but I’m not sure how likely it is that human extrapolated volition and babyeater extrapolated volition look at all different if you just got the extrapolator working right. I am now going to duck out of this conversation because saying unconventional things on LW means you have to be really careful about your phrasing, and I don’t have the necessary mental energy nor desire. I mostly just hope that someone out there is going to fill in the gaps of what I’m saying and therefore get to play with the ideas I’m trying to convey.
And the reason you value friendship is that “evolution” “made” it so, following the Big Bang. Informal descriptions of physical causes and effects don’t translate into moral arguments, and there’s no a priori reason to care about what other “agents” present in your causal past (light cone!) “cared” about, no more than caring about what they “hated”, or even to consider such a concept.
(I become more and more convinced that you do have a serious problem with the virtue of narrowness, better stop the meta-contrarian nonsense and work on that.)
You’re responding to an interpretation of what I said that assumes I’m stupid, not the thing I was actually trying to say. Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments? I’m not retarded. I just don’t have the energy to think through all the ways that people could interpret what I’m saying as something dumb because it pattern matches to things dumb people say. I’m going to start disclaiming this at the top of every comment, as suggested by Steve Rayhawk.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values. I’m not saying your average AI programmer can make an AI that does this, though I am suggesting it is plausible.
This is stupid. I’m suggesting a hypothesis with low probability that is contrary to standard opinion. If you want to dismiss it via absurdity heuristic go ahead, but that doesn’t mean that there aren’t other people who might actually think about what I might mean while assuming that I’ve actually thought about the things I’m trying to say. This same annoying thing happened with Jef Allbright, who had interesting things to say but no one had the ontology to understand him so they just assumed he was speaking nonsense. Including Eliezer. LW inherited Eliezer’s weakness in this regard, though admittedly the strength of narrowness and precision was probably bolstered in its absence.
If what I am saying sounds mysterious, that is a fact about your unwillingness to be charitable as much as it is about my unwillingness to be precise. (And if you disagree with that, see it as an example.) That we are both apparently unwilling doesn’t mean that either of us is stupid. It just means that we are not each others’ intended audience.
(Downvoted.)
No one said you were stupid.
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
As a parable, imagine an engineer proposing a design for a perpetual motion device. An onlooker objects: “But what about conservation of energy?” The engineer says: “Do you seriously think I spent four years at University without understanding such basic arguments?” An uncharitable onlooker might say “Yes.” A better answer, I think, is: “Your personal credentials are not at issue, but the objection to your design remains.”
(Upvoted.)
I suppose I mostly meant ‘irrational’, not stupid. I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera. If I didn’t understand these arguments after having spent so much time looking at them… I may not be stupid, but there’d definitely be some kind of gross cognitive impairment going on in software if not in hardware.
There were a few cues where I acknowledged that I agreed with the standard argument (AGI won’t automatically converge to Eliezer’s “good”), but was interested in a different argument about philosophically-sound AIs that didn’t necessarily even look at humanity as a source of value but still managed to converge to Eliezer’s good, because extrapolated volitions for all evolved agents cohere. (I realize that your intuition is interestingly perhaps somewhat opposite mine here, in that you fear more than I do that there won’t be much coherence even among human values. I think that we might just be looking at different stages of extrapolation… if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting). If this idea is too vague or you’re interested I could expand on this elsewhere.)
Your parable makes sense, it’s just that I don’t think I was proposing a perpetual motion device, just something that could sound like a perpetual motion device if I’m not clear enough in my exposition, which it looks like I wasn’t. I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
What does time discounting have to do with coherence? Of course exponential discounting is “universal” in the sense that if you’re going to time-discount at all (and I don’t think we should), you need to use an exponential in order to avoid preference reversals. But this doesn’t tell us anything about what exponential discounters are optimizing for.
I think your comments would be better received if you just directly talked about your ideas and reasoning, rather than first mentioning your shocking conclusions (“theism might be correct,” “volitions of evolved agents cohere”) while disclaiming that it’s not how it looks. If you make a good argument that just so happens to result in a shocking conclusion, then great, but make sure the focus is on the reasons rather than the conclusion.
vs.
It really really seems like these two statements contradict each other; I think this is the source of the confusion. Can you go into more detail about the second statement?
In particular, why would two agents which both evolved but under two different fitness functions be expected to have the same volition?
“Basic SIAI arguments like “value is fragile”″ …? You mean this...?
The post starts out with:
...it says it isn’t basic—and it also seems pretty bizarre.
For instance, what about the martians? I think they would find worth in a martian future.
Yeah, and paperclippers would find worth in a future full of paperclips, and pebblesorters would find worth in a future full of prime-numbered heaps of pebbles. Fuck ’em.
If the martians are persons and they are doing anything interesting with their civilization, or even if they’re just not harming us, then we’ll keep them around. “Human values” doesn’t mean “valuing only humans”. Humans are capable of valuing all sorts of non-human things.
Suppose someone who reliably does not generate common obviously wrong ideas/arguments has an uncommon idea that is wrong in a way that is non-obvious, but that you could explain if the wrong idea itself were precisely explained to you. But this person does not precisely explain their idea, but instead vaguely points to it with a description that sounds very much like a common obviously wrong idea. So you try to apply charity and fill in the gaps to figure out what they are really saying, but even if you do find the idea that they had in mind, you wouldn’t identify as such, because you see how that idea is wrong, and being charitable, you can’t interpret what they said in that way. How could you figure out what this person is talking about?
You’d have to be speaking their language in the first place. That’s why I wrote about intended audiences. But it seems that at my current level of vagueness my intended audience doesn’t exist. I’ll either have to get more precise or stop posting stuff that appears to be nonsense.
Sometimes it is impossible to reach an intended audience when the not-intended audience is using you as a punching bag to impress their intended audience. Most of debate in conventional practice is, after all, about trying to spin what the other person says to make them look bad. If your ‘intended audience’ then chose to engage with you at the level you were hoping to converse they risk being collaterally damaged in the social bombardment.
For my part I reached the conclusion that you are probably using a different conception of ‘morality’, analogous to the slightly different conception of ‘theism’ from your recent thread. This is dangerous because in group signalling incentives are such that people will be predictably inclined to ignore the novelty in your thoughts and target the nearest known stupid thing to what you say. And you must admit: you made it easy for them this time!
It may be worth reconsidering the point you are trying to discuss a little more carefully, and perhaps avoiding the use of the term ‘morality’. You could then make a post on the subject such that some people can understand your intended meaning and have useful conversation without risking losing face. It will not work with everyone, there are certain people you will just have to ignore. But you should get some useful discussion out of it. I note, for example, that while your ‘theism’ discussion got early downvotes by the most (to put it politely) passionate voters it ended up creeping up to positive.
As for guessing what sane things you may be trying to talk about I basically reached the conclusion “Either what you are getting at boils down to the outcome of acausal trade or it is stupid”. And acausal trade is something that I can not claim to be certain about.
That sounds bad—perhaps reconsider.
As a “peace offering”, I’ll describe a somewhat similar argument, although it stands as an open confusion, not so much a hypothesis.
Consider a human “prototype agent”, additional data that you plug in into a proto-FAI (already a human-specific thingie, or course) to refer to precise human decision problem. Where does this human end, where are its boundaries? Why would its body be the cutoff point, why not include all of its causal past, all the way back to the Big Bang? At which point, talking about the human in particular seems to become useless, after all it’s a tiny portion of all that data. But clearly we need to somehow point to the human decision problem, to distinguish it from frog decision problem and the like, even though such boundless prototype agents share all of their data. Do you point to human finger and specify this actuator as the locus through which the universe is to be interpreted, as opposed to pointing to a frog’s leg? Possibly, but it’ll take a better understanding of interpreting decision problems from arbitrary agents’ definitions to make progress on questions like this.
With each comment like this you make, and lack of comments that show clear understanding, I think that more and more confidently, yes. Disclaimers don’t help in such cases. You don’t have to be stupid, you clearly aren’t, but you seem to be using your intelligence to confuse yourself by lumping everything together instead of carefully examining distinct issues. Even if you actually understand something, adding a lot of noise over this understanding makes the overall model much less accurate.
One thing they rather obviously might converge on is the “goal system zero” / “Universal Instrumental Values” thing. The other main candidates seem to be “fitness” and “pleasure”. These might well preserve humans for a while—in historical exhibits.
Nor is there an a priori reason for an AI to exist, for it to understand what ‘paperclips’ are, let alone for it to self-improve through learning like a human child does, absorb human languages, and upgrade itself to the extent necessary to take over the world.
I suspect that any team of scientists or engineers with the knowledge and capability required to build an AGI with at least human-infant level cognitive capacity and the ability to learn human language will understand that making the AI’s goal system dynamic is not only advantageous, but is necessitated in practice by the cognitive capabilities required for understanding human language.
The idea of a paperclip maximizer taking over the world is a mostly harmless absurdity, but it also detracts from serious discussion.
But that sounds like we’re programming the AI in English. I can’t see an AI with a motivational system well-defined enough to work at all getting confused in that way; would “Do what my creator intended me to do, if I can’t figure out what else to do” even show up as a motivational drive if it is not explicitly coded in?
He’s speaking to you in English, because you speak English. Also, the website wouldn’t let him post the bytecode.
You made a sweeping statement about all possible AI architectures. What are your reasons for it?
There’s reason to suspect that any human-level AI must be programmed in human languages.
In fact, that’s almost tautological by virtue of the Turing Test.
Another way to look at it: we developed simplified formal computer languages to program the tiny simple circuits we could build at the time, but the goal for AGI has always been to develop a system you could directly program in the full complexity of human languages.
Think about how the software industry works—high level business goals in English, translated into more technical english for system engineers and designers, translated down into the much simpler verbose programming languages such as C++, then machine translated to the even simpler assembly the CPU can actually understand.
Programming an AI in C++? Doesn’t compute.
Of the concepts named for Alan Turing it is “Turing Completeness” that is far more interesting and important than the chatbot test. If you think on the concept of a Turing complete computation system you will perhaps realise why the rest of us would consider your claim extremely silly. Well, one of the reasons anyway.
That last statement you quoted was silly. Not funny at all, apparently.
What?
Do you mean humanlike AIs? An AI capable of passing the Turing Test would of course need to understand human language well enough to act convincingly human (or at least do a really good imitation), but that’s not necessarily a human-level AI (convincing people that you’re human is a separate task from actually being human, probably a much easier one), and human-level AIs in general needn’t necessarily understand human language any better than any other sort of language by default.
Anyway, an AI being “programmed in human languages” seems to be going by the “programming = instructions being given to a human servant” metaphor, and if you want that to work, you clearly first need to write the servant in something other than human language. And copying human psychology well enough that the AI actually understands human language as well as a human does, rather than being able to imitate understanding well enough to carry on a text-based conversation, is no easy task, and is probably a lot harder than manually coding a simple goal system like paperclip maximization in a lower-level language. But that could still be an AGI.
Human level AI—an AGI design capable of matching the full intellectual capabilities of the best human scientists/engineers.
To get to H level in a practical timeframe, a human AI will have to learn human knowledge, it will have to experience an equivalent to a standard 20-25 year education.
Learning human knowledge in practice requires learning human language as an early initial precursor step.
The software of a human mind—the memeset or belief network, is essentially a complex human language program.
For an AI to achieve human-level, it will have to actually understand human language as well as a human does, and this requires a bunch of algorithmic complexity from the human brain at the hardware level and it implies the capability to parse and run human language programs.
So you only need to program the infant brain in a programming language—the rest can be programmed in human language.
If it doesn’t have the capacity to understand human level language then it’s not an AGI—as that is the defining characteristic of the concept (by my/Turing’s definition).
And thus by extension, the defining characteristic of a human-mind is human language capability.
EDIT: Why are you downvoting? Don’t agree and don’t want to comment?
Turing never intended his test to be adopted as “the defining characteristic of the concept [of AGI]” in anything like this fashion. Human ‘level’ language is also somewhat misleading in as much as it implies it is reaching a level of communication power rather than adapting specifically to the kind of communications humans happen to have evolved—especially the quirks and weaknesses.
I disagree somewhat. It’s difficult to know exactly what “he intended”, but the opening of his paper which introduces the concept, starts with “Can machines think?”, and describes a reasonable language based test: an intelligent machine is one that can convince us of it’s intelligence in plain human language.
I meant natural language, the understanding of which certainly does require a certain minimum level of cognitive capabilities.
We have a much greater understanding of what the “think” in “Can machines think?” means now. We have better tests than seeing if they can fake human language.
The test isn’t about faking human language, it’s about using language to probe another mind. Whales and elephants have brains built out of similar quantities of the same cortical circuits but without a common language stepping into their minds is very difficult.
What’s a better test for AI than the turing test?
Give it a series of fairly difficult and broad ranging tasks, none of which it has been created with existing specialised knowledge to handle.
Yes—the AIQ idea.
But how do you describe the task and how does the AI learn about it? There’s a massive gulf between AI’s which can have the task/game described in human language and those that can not. Whale brains and elephants fall in the latter category. An AI which can realistically self-improve to human levels needs to be in the former category, like a human child.
You could define intelligence with an AIQ concept so abstract that it captures only learning from scratch without absorbing human knowledge, but that would be a different concept—it wouldn’t represent practical capacity to intellectually self-improve in our world.
Use something like Prolog to declare the environment and problem. If I knew how the AI would learn about it, I could build an AI already. And indeed, there are fields of machine learning for things such as Bayesian inference.
If you have to describe every potential probelm to the AI in Prolog, how will it learn to become a computer scientist or quantum physicist?
Describe the problem of learning how to become a computer scientist or quantum physicist, then let it solve that problem. Now it can learn to become a computer scientists or quantum physicist.
(That said, a better method would be to describe computer science and quantum physics and just let it solve those fields.)
Or a much better method: describe the problem of an AI that can learn natural language, the rest follows.
Except for all problems which are underspecified in natural language.
Which might be some pretty important ones.
Agreement that human children are more intelligent than whales or elephants is likely to be the closest we get to agreement on this subject. You would need to absorb a lot of new knowledge from all the replies from various sources that have been provided to you here already before in progress is possible.
Unfortunately it seems we are not even fully in agreement about that. A turing style test is a test of knowledge, the AIQ style test is a test of abstract intelligence.
An AIQ type test which just measures abstract intelligence fails to differentiate between feral einstein and educated einstein.
Effective intelligence, perhaps call it wisdom, is some product of intelligence and knowledge. The difference between human minds and those of elephants or whales is that of knowledge.
My core point, to reiterate again: the defining characteristic of human minds is knowledge, not raw intelligence.
Intelligence can produce knowledge from the environment. Feral Einstein would develop knowledge of the world, to the extent that he wasn’t limited by non-knowledge/intelligence factors (like finding shelter or feeding himself).
Possibly relevant: AIXI-style IQ tests.
Very probably not. I’m claiming that the desire to code it in would be convergent, ’cuz it’s the best way to do AI even if you think you’re just trying to maximize paperclips. Of course, most AGI researchers aren’t that clever, so again, we still need to raise awareness about AGI dangers. I’m just floating a contrarian hypothesis that seems somewhat neglected.
But it’s a lot harder to code that than to code “maximize paperclips”.
Then you should have said that!
That sounds exactly like CEV.
I think a closer match is the “shaper-anchor semantics” from Eliezer’s “Creating Friendly AI”.
I think I see what you’re saying; just as we reflect on our desires and try to understand how they tick and where, biologically and historically and culturally, they come from, so also might any AI.
However, the thing about it is: that doesn’t actually change those values. For example (and despite the dire warnings of some creationists), despite the fact that we now understand that our value system is a consequence of an evolutionary algorithm, we haven’t actually started valuing evolutionary goals over our own built-in goals. For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
Similarly, a paperclip-maximizer might well be interested in figuring out why its utility function is what it is, so that it may better understand the world it lives in… but that’s not going to change its overriding and primary interest in making paperclips.
It seems as though that sometimes triggers intimate pair-bonding activities while reducing your exposure to STDs. Use of condoms is often not remotely silly from that perspective—IMHO.
The example still works since there are quite a few couples who use condoms because they just don’t want to have kids. They don’t have any worry about STDs from their partner. If you insist on a clear cut case look at men who get vasectomies.
The idea that use of contraception is “silly” from the perspective of gene propagation seem just wrong to me. There are plenty of cases where it would make sense for those who want to spread their genes around to agree to use contraceptives. Contraceptive use makes sense sometimes, and not others.
It could be claimed that the average effect of contraception on genes is negative—but that seems to be a whole different thesis.
Tim, do you agree that there exist couples who plan to never have children and use contraception to that end?
Sure. Surely we are not disagreeing here. The original comment was:
My position is just that contraception has a perfectly reasonably place for gene propogators. The idea that contraception is always opposed to your genetic interests is wrong. Lack of contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
I’m not sure if we are. The fact that contraception might have a reasonable place for gene propagators is not the issue. The point is that much, and possibly the vast majority, of contraceptive use is contrary to the goals of gene propagation.
Not really. Remember, evolution doesn’t care about your happiness. Indeed, regarding the example you linked to, from an evolutionary perspective,a one night stand with all the protection is utterly useless. It is very likely in that male’s evolutionary advantage to not use condoms.
And even if you don’t agree with the condom example the other example, of a people engaging in a generally irreversible or difficult to reverse operation which renders them close to sterile is pretty clearly against the interest of gene propagation.
Humans evolved in a context where we didn’t have easy contraception and the best humans could do to prevent contraception was things like coitus interruptus. It shouldn’t surprise you that evolution has not made human instincts catch up with modern technologies.
One might think that from an evolutionary perspective it makes sense to substantially delay or reduce offspring number so as to invest maximum resources in a small number of offspring. But humans in the developed world now reside in a situation with low disease rates and lots of resources, so that strategy is sub-optimal from an evolutionary perspective. Look at how charedi(ultra-orthodox) Jews and the Amish are two of the fastest growing populations in the United States.
I can see what you think the issue is. What I don’t see is where in the context you are getting that impression from.
Your example is stacked to favour your conclusion. What you need to try and do in order to understand my position is to think about an example that favours my conclusion.
So: get rid of the one-night stand, and imagine that the girl is desirable—that having safe sex with her looks like the best way to initiate a pair-bonding process leading to the two of you having some babies together—and that the alternative is rejection, and her walking off and telling her friends what a jerk you are when it comes to protecting your girl.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
I think that’s a retcon. People use contraception so they can have more sex than they would if they had to worry about having kids every time. They may or may not rationalise further, I suspect that generally they don’t.
In terms of genetic success, having more kids than you can keep track of is pretty much the ideal, as long as all or at least most survive to reproductive adulthood.
But some people consciously choose never to have any kids. That’s silly from the perspective of gene propagation if anything is.
Sure it does. A devout priest spends half his life celibate and serving God. One day he has a crisis, reads a bunch of stuff on the internet and suddenly realizes he doesn’t believe in God. His values change.
Even this is questionable. I suspect any concept of universal morality must be evolutionary. This certainly is a widespread concept in systems/transhumanist/singularitan/cosmist thought. We do value evolution in and of itself.
It’s probably possible in principle to build such an AI—it would probably need some sort of immutable hard-coded paperclip recognition module which it could evaluate potential simulated futures generated from the more complex general intelligence system.
If such a thing developed to a human level or beyond and could reflect on it’s cognition, it may explain in lucid detail how futures filled with paperclips were good and others were evil.
It could even understand that it’s concepts of morality and good/bad were radically different than those of humans, and it would even understand that this difference relates to it’s hard coded paper-clip recognizer, and it would explain in detail how this architecture was superior to human value systems .. because it helped to maximize expected future paperclips.
It could even write books such as “Paperclip Morality: the Truth”.
But just because such a thing is possible in principle doesn’t make it the slightest bit likely.
If you can build an AGI that can understand human language, it would be much easier and considerably more effective to make the AGI’s goal system dynamically modifiable on reflection through human language.
Instead of having a special hard-coded circuit to evaluate the utility of potential futures, you could just have the general conceptual circuitry handle this. The concept of ‘good’ would still be somewhat special in it’s role in the goal system itself, but the ‘goodness recognizer’ could change and evolve over time.
Well, the counter-argument to that particular example would be that the priest’s belief in God wasn’t a terminal value; rather, their goals of being happy and helping other people and understanding the universe were. Believing in and obeying God were just instrumental values.
However, agreed that there’s nothing in particular forcing people, weird and funky and clunky as our minds are, from always having the same fixed terminal values either. To pick an extreme example, peoples’ brains can sometimes be messed up severely by hormonal imbalances, which can in turn cause people to do such drastically anti-own-terminal-value things as committing suicide.
I should’ve been more specific and just said that, in general, understanding evolutionary psychology never or only very rarely causes peoples’ terminal values to change.
Human morality is a product of evolution; however, our morality is not itself an evolutionary algorithm execution mechanism. It’s kind of a vague approximation of one (in that all the moralities that sucked for fitness were selected against), but it still often leads to drastically different results than a straight-up evolutionary fitness maximization algorithm with access to our brains’ resources would.
For example: I intend never to have biological children and consider this decision to be a moral one. However, from an evolutionary perspective, deliberately preventing my own genes from propagating is just plain silly.
Yes, the paper-clip maximizer is just a whimsical example. However, similarly Really Unfriendly optimizers are quite plausible. Imagine the horrors that could result from a naive human-happiness-maximizer hitting the singularity asymptote.
Yes, that would be important, but it still wouldn’t be enough to solve the problem; in fact, the really hard part of the problem still remains! The happiness-maximizer might base its understanding of happiness on descriptive human usage of the word, and end up with a truly thorough and consistent understanding of the word… and then still turn everybody into nearly mindless wireheads.
Our morality engines and our language aren’t properly tuned for dealing with the kind of reality-bending power a superintelligent entity would have.
You may not have liked that particular example, but I think you are in agreement that terminal values change.
Just to make sure though, a few more examples:
someone who likes chocolate ice cream and then some years later prefers vanilla instead
someone who likes impressionist art but then years later prefers post-modern
someone who likes cats more than dogs
someone who likes chinese culture more than ethiopian
I don’t find such maximizers significantly plausible at even the human level intelligence. Possible in principle? Sure. But if you look at realistic, plausible routes to AGI it becomes clear that an AGI necessarily will be programmed in human languages and will pick up human cultural programs.
And finally, even if it was plausible that a flawed design could hit the singularity asymptote, that itself might only be a big problem if it had a short planning horizon.
It seems that all superintelligences with infinite planning horizons become behaviorally indistinguishable. All long-term value systems converge on a single universal attractor—they become cosmists.
That is what I mean when I said “any concept of universal morality must be evolutionary”.
That would again be assuming humans capable of building a superhuman AGI but asinine enough to attempt to somehow hardcode it’s goal system, instead of making it open-ended dynamic as a human’s.
How would you build a happiness maximizer and fix the value of happiness? The meaning of a word in a human brain is stored as huge set of associate weights that anchor it in a massive distributed belief network. The exact meaning of each word changes over time as the network learns and reconfigures itself—no concept is quite static. So for an AGI to understand the word in the same way we do, the word’s meaning is always subject to some drift. And this is a good thing.
I think we may be experiencing some terminology confusion here. Just to be clear, you realize that these are all not terminal values, right?
Here’s the big issue: if it’s open-ended, how do we keep it from drifting off somewhere terrible? The system that guides that seems to be the largest potential risk point of the approach you describe.
I’m very confused by this; can you go into more detail about why you think this is so? In particular, why would it be true for all long-term value systems (including flawed and simplistic value systems), and not just a very small subset?
No. What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens? I’m not sure I buy into the concept.
The point of value or preferences from the perspective of intelligence is to rate potential futures.
We are open-ended! Our future-preferences depend on and our intertwined with our knowledge. So any superintelligence or evolutionary accelerator we create will also need to be open-ended, or it wouldn’t be protecting our dynamic core.
I discussed some of this in my first, somewhat hasty, LW post here. A few others here have mentioned a similar idea, I may write more about it as I find it interesting.
Basically, if your planning horizon extends to infinity you will devote all of your resources towards expanding your net intelligence for the long term future, regardless of what your long term goals are.
So no matter whether your long term goal is to maximize paper-clips, human happiness or something more abstract, in each case this leads to an identical outcome for the foreseeable future: a local computational singularity with an exponentially expanding simulated metaverse.
There is some speculation within physics that black hole like singularities can create new physical universes through inflation. If this is true than the long term goals of a superintelligence are best served by literally creating new physical multiverses that have more of the desirable space-time properties.
No, what I’m referring to is also known as an intrinsic value. It’s a value that is valuable in and of itself, not in justification for some other value. A non-terminal value is commonly referred to as an instrumental value.
For example, I value riding roller-coasters, and I also value playing Dance Dance Revolution. However, those values are expressible in terms of another, deeper value, the value I place on having fun. That value may in turn be thought of as an instrumental value of a yet deeper value: the value I place on being happy moment-to-moment.
If you were going to implement your own preference function as a Turing machine, trying to keep the code as short as possible, the terminal values would be the things that machine would value.
Okay, I see where you’re coming from. However, from a human perspective, that’s still a pretty large potential target range, and a large proportion of it is undesirable.
From the deeper perspective of computational neuroscience, the intrinsic/instrumental values reduce to cached predictions of your proposed ‘terminal value’ (being happy moment-to-moment), which reduces to various types of stimulations of the planning reward circuitry.
Labeling the experience of chocolate ice cream as an ‘instrumental value’ and the resulting moment-to-moment happiness as the real ‘terminal value’ is a useless distinction—it then collapses your terminal values down to the singular of ‘happiness’ and relabels everything worthy of discussion as ‘instrumental’.
The quality of being happy moment-to-moment is anything but a single value and should not by any means be reduced to a single concept. It is a vast space of possible mental stimuli, each of which creates a unique conscious experience.
The set of mental states encompassed by “being happy moment-to-moment moment-to-moment” is vast: the gustatory pleasure of eating chocolate ice cream, the feeling of smooth silk sheets, the release of orgasm, the satisfaction of winning a game of chess, the accomplishment of completing a project, the visual experience of watching a film, the euphoria of eureka, all of these describe entire complex spaces of possible mental states.
Furthemore, the set of possible mental states is forever dynamic, incomplete, and undefined. The set of possible worlds that could lead to different visual experiences, as just a starter example, is infinite, and each new experience or piece of knowledge itself changes the circuitry underlying the experiences and thus changes our values.
The simplest complete turing machine implementation of your preference function is an emulation of your mind. It is you, and it has no perfect simpler equivalent (although many imperfect simulations are possible).
The core of the cosmist idea is that for any possible goal evaluator with an infinite planning horizon, there is a single convergent optimal path towards that goal system. So no, the potential target range in theory is not large at all—it is singularly narrow.
As an example, consider a model universe consisting of a modified game of chess or go. The winner of the game is then free to arrange the pieces on the board in any particular fashion (including the previously dead pieces). The AI’s entire goal is to make some particular board arrangement - perhaps a smily face. For any such possible goal system, all AI’s play the game exactly the same at the limits of intelligence—they just play optimally. Their behaviour doesn’t differ in the slightest until the game is done and they have won.
Whether the sequence of winning moves such a god would make on our board is undesirable or not from our current perspective is a much more important, and complex, question.
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it’s really damn hard to precisely specify a paperclip. (There are things that are easier to specify that this argument doesn’t apply to and that are more plausibly dangerous, like hyperintelligent theorem provers...) Thus in trying to figure out what it’s utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) ‘maximize paperclips’ is because ‘maximize paperclips’ was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain. This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly. Admittedly, most uFAIs probably won’t be that sophisticated, and so worrying about AI-related existential risks is still definitely a big deal. We just might want to be a little more cognizant of potential motivations for people who disagree with what has recently been dubbed SIAI’s ‘scary idea’.
Hm. I suppose that’s possible, though it would require that the AI be given a utility function that’s specifically meant to be amenable to that kind of revision.
Under the most straightforward (i.e. not CEV-style) utility function design, fuzziness in its definition of “paperclip” would just drive the paperclip-maximizer to choose the possible definition that yields the highest utility score.
To pick a different silly example, a dog-maximizer with a utility function based on the number of dogs in the universe would simply prefer to tile the solar system with tiny Chihuahas rather than Great Danes; the whole range of “dog” definitions fit the function, so it just chooses the one that is most convenient for maximum utility. It wouldn’t try to resolve it by trying to decide which definition is more in line with the designer’s ideals, unless “consider the designer’s ideals” were designed into the system from the start.
Is designing “consider the designer’s ideals” in an AI difficult?
Currently expected to be difficult, since we don’t know of an easy way to do so. That it’ll turn out to be easy (in the hindsight) is not totally out of the question.
Has anyone considered approaching this problem in the same way we might approach “read the user’s handwriting”? That is, the task is not one we program the AI to accomplish—instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases.
Mirrors and Paintings (yes, you want to point your program at the world and have it figure out what you referred to), The Hidden Complexity of Wishes (if you need to answer AI’s question or give it instructions, you’re doing something wrong and it won’t work).
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
But then we get down to doing the comparison. The AI informs me that what I really want is to kill my father and sleep with my mother. I deny this. Do we take this as evidence that the AI really does know me better than I know myself, or as a symptom of a bug?
I would argue that if you don’t need to answer the AI’s questions or give it instructions, you’re doing something wrong and it won’t work. By definition. At least for the first ten thousand scans or so. And even then there will remain questions on which the AI and introspection would deliver different answers. Questions with hidden complexity. I just don’t see how anyone would trust a CEV extrapolated from brain scans until we had decades of experience suggesting that scanning and modeling yields better results than introspection.
Agreed. And any useful AI will have to understand human language to do or learn much anything of value.
The detailed analysis of full brain scanning tech I’ve seen puts it far into the future, well beyond human-level AGI.
You have to make sure AI predictably gives a better answer even on questions where you disagree. And there will be questions which can’t even be asked of a human.
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer’s suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven’t you just asked me to assume that there are no differences?
Sorry, I simply don’t understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
The upload is not the AI (and Eliezer’s post doesn’t refer to uploads IIRC, but for the sake of the argument assume they are available as raw material). You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
What would I need to make of that?
But this is not a philosophical argument.
To recap:
I suggested that an AI which is a precursor to the FAI should come to understand human values by interacting (over an extended ‘training’ period) with actual humans—asking them questions about their values and perhaps performing some experiments as in a psych or game theory laboratory.
You responded by linking to this, which as I read it suggests that the most accurate and efficient way to extract the values of a human test subject would be by carrying out a non-destructive brain scan. Quoting the posting:
I asked how we could possibly come to know by testing that the scanning and brain modeling was working properly. I could have asked instead how we could test the hypothesis that the inference from behavior was working properly.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist. A provably good scientist. Provable because it is a simple program and we understand epistemology well enough to write a correct behavioral specification of a scientist and then verify that the program meets the specification. So we can let the AI design the brain scanner and perform the human behavioral experiments to calibrate its brain models. We only need to spot-check the science it generates, because we already know that it is a good scientist.
Hmmm. That is actually a pretty good argument, if that is what you are suggesting. I’ll have to give that one some thought.
Sorry, not my area at the moment. I gave the links to refer to arguments for why having AI learn in the traditional sense is a bad idea, not for instructions on how to do it correctly in a currently feasible way. Nobody knows that, so you can’t expect an answer, but the plan of telling the AI things we think we want it to learn is fundamentally broken. If nothing better can be done, too bad for humanity.
This is much closer, although a “scientist” is probably a bad word to describe that, and given that I don’t have any idea what kind of system can play this role, it’s pointless to speculate. Just take as the problem statement what you quoted from the post:
Relevant—Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
One step at a time, my good sir! Reducing the philosophical and mathematical problem of Friendly AI to the technological problem of uploading would be an astonishing breakthrough quite by itself.
I think this reflects the practical problem with Friendly AI—it is an ideal of perfection taken to an extreme that expands the problem scope far beyond what is likely to be near term realizable.
I expect that most of the world, research teams, companies, the VC community and so on will be largely happy with an AGI that just implements an improved version of the human mind.
For example, humans have an ability to model other agents and their goals, and through love/empathy value the well-being of others as part of our own individual internal goal systems.
I don’t see yet why that particular system is difficult or more complex than the rest of AGI.
It seems likely that once we can build an AGI as good as the brain we can build one that is human-like but only has the love/empathy circuitry in it’s goal system with the rest of the crud stripped out.
In other words if we can build AGI’s modeled after the best components of the best examples of altruistic humans, this should be quite sufficient.
This is the straightforward approach.
Once you have an AGI that has the cognitive capability and learning capacity of a human infant brain, you teach it everything else in human language—right/wrong, ethics/morality, etc.
Programming languages are precise and well suited for creating the architecture itself, but human languages are naturally more effective for conveying human knowledge.
I tend to agree that we need a natural language interface to the AI. But it is far easier to create automatic proofs of program correctness when the really important stuff (like ethics) is presented in a formal language equipped with a deductive system.
There is something to be said for treating all the natural language input as if it were testimony from unreliable witnesses—suitable, perhaps, for locating hypotheses, but not really suitable as strong evidence for accepting the hypotheses.
I’m not sure how this applies—can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff—meta-ethics, epistemology, etc., be represented in some other way than by ‘neural’ networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn’t change meaning when the AI “rewrites its own code”
By formal, I assume you mean math/code.
The really important stuff isn’t a special category of knowledge. It is all connected—a tangled web of interconnected complex symbolic concepts for which human language is a natural representation.
What is the precise mathematical definition of ethics? If you really think of what it would entail to describe that precisely, you would need to describe humans, civilization, goals, brains, and a huge set of other concepts.
In essence you would need to describe an approximation of our world. You would need to describe a belief/neural/statistical inference network that represented that word internally as a complex association between other concepts that eventually grounds out into world sensory predictions.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
These folks seem to agree with you about the massive complexity of the world, but seem to disagree with you that natural language is adequate for reliable machine-based reasoning about that world.
As for the rest of it, we seem to be coming from two different eras of AI research as well as different application areas. My AI training took place back around 1980 and my research involved automated proofs of program correctness. I was already out of the field and working on totally different stuff when neural nets became ‘hot’. I know next to nothing about modern machine learning.
I’ve read about CYC a while back—from what I recall/gather it is a massive handbuilt database of little natural language ‘facts’.
Some of the new stuff they are working on with search looks kinda interesting, but in general I don’t see this as a viable approach to AGI. A big syntactic database isn’t really knowledge—it needs to be grounded to a massive sub-symbolic learning system to get the semantics part.
On the other hand, specialized languages for AGI’s? Sure. But they will need to learn human languages first to be of practical value.
Blind men looking at elephants.
You look at CYC and see a massive hand-built database of facts.
I look and see a smaller (but still large) hand-built ontology of concepts
You, probably because you have worked in computer vision or pattern recognition, notice that the database needs to be grounded in some kind of perception machinery to get semantics.
I, probably because I have worked in logic and theorem proving, wonder what axioms and rules of inference exist to efficiently provide inference and planning based upon this ontology.
One of my favorite analogies and I’m fond of the Jainist? multi-viewpoint approach.
As for the logic/inference angle, I suspect that this type of database underestimates the complexity of actual neural concepts—as most of the associations are subconscious and deeply embedded in the network.
We use ‘connotation’ to describe part of this embedding concept, but I see it as even deeper than that. A full description of even a simple concept may be on the order of billions of such associations. If this is true, then a CYC like approach is far from appropriately scalable.
It appears that you doubt that an AI whose ontology is simpler and cleaner than that of a human can possibly be intellectually more powerful than a human.
All else being equal, I would doubt that with respect to a simpler ontology, while the ‘cleaner’ adjective is less well defined.
Look at it in terms of the number of possible circuit/program configurations that are “intellectually more powerful than a human” as a function of the circuit/program’s total bit size.
At around the human level of roughly 10^15 I’m almost positive there are intellectually more powerful designs—so P_SH(10^15) = 1.0.
I’m also positive that beyond some threshold there are absolutely zero possible configurations of superhuman intellect—say P_SH(10^10) ~ 0.0.
Of course “intellectually more powerful” is open to interpretation. I’m thinking of it here in terms of the range of general intelligence tasks human brains are specially optimized for.
IBM’s Watson is superhuman in a certain novel narrow range of abilities, and it’s of complexity around 10^12 to 10^13.
To get to that point we have to start from the right meaning to begin with, and care about preserving it accurately, and Jacob doesn’t agree those steps are important or particularly hard.
Not quite.
As for the start with the right meaning part, I think it is extremely hard to ‘solve’ morality in the way typically meant here with CEV or what not.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
As for the second part about preserving it accurately, I think that ethics/morality is complex enough that it can only be succinctly expressed in symbolic associative human languages. An AGI could learn how to model (and value) the preferences of others in much the same way humans do.
Someone help me out. What is the right post to link to that goes into the details of why I want to scream “No! No! No! We’re all going to die!” in response to this?
Coming of Age sequence examined realization of this error from Eliezer’s standpoint, and has further links.
In which post? I’m not finding discussion about the supposed danger of improved humanish AGI.
That Tiny Note of Discord, say. (Not on “humanish” AGI, but eventually exploding AGI.)
I don’t see much of a relation at all to what i’ve been discussing in that first post.
[http://lesswrong.com/lw/lq/fake_utility_functions/] is a little closer, but still doesn’t deal with human-ish AGI.
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function! So whatever AI the first one builds is necessarily going to either have the same utility function (in which case the first AI is working correctly), or have a different one (which is a sign of malfunction, and given the complexity of morality, probably a fatal one).
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function. To the extent that we have a utility function at all, it would refer to the abstract computation called “morality”, which “better” is defined by. The most moral AI we could create is therefore one with precisely that utility function. The problem is that we don’t exactly know what our utility function is (hence CEV).
There is a sense in which a Friendly AGI could be said to be “better than us”, in that a well-designed one would not suffer from akrasia and whatever other biases prevent us from actually realizing our utility function.
AI’s without utility functions, but some other motivational structure, will tend to self-improve to a utility function AI. Utility-function AI’s seem more stable under self-improvement, but there are many reasons it might want to change its utility (eg speed of access, multi-agent situations).
Could you clarify what you mean by an “other motivational structure?” Something with preference non-transitivity?
For instance. http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf
It wouldn’t if it initially considered itself to be the only agent in the universe. But if it recognizes the existence of other agents and the impact of other agents’ decisions on its own utility, then there are many possibilities:
The new AI could be created as a joint venture of two existing agents.
The new AI could be built because the builder was compensated for doing so.
The new AI could be built because the builder was threatened into doing so.
This may seem intuitively obvious, but it is actually often false in a multi-agent environment.
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If that AGI would not be somewhat better than us in the sense of having a better utility function, then ‘utility function’ is not a useful concept.
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
Before tackling that problem, it would probably best to start with something much simpler, such as a utility function that could recognize dogs vs cats and other objects in images. If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Yes?
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If those are, in fact, your real preferences, then sure.
I occasionally point out that you can model any computable behaviour using a utility-maximizing algorithm, provided you are allowed to use a partially-recursive utility function.
Please read the sequences, and stop talking about AI until you do.
I’ve read the sequences. Discuss or leave me alone.
Thanks, that’s useful to know.
Edit: Seriously, no irony, that’s useful. Disagreement should be treated differently depending on background.
Also, very little of the sequences have much of anything to do with AI. If I want to learn more about that I would look to Norvig’s book or more likely the relevant papers online. No need to be rude just because I don’t hold all your same beliefs.
It’s more of a problem with your understanding of ethics, as applied to AI (and since this is the main context in which AI is discussed here, I referred to that as simply AI). You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs.
Unfortunately there is (in some senses of “rude”, such as discouraging certain conversational modes).
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Yes.
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
Thanks for the value risk link—that discussion is what I’m interested in.
I guess I’ll reply to it there. The initial quotes from Ben G. and Hanson are similar to my current view.
There is some discussion of the dangers of a uFAI Singularity, particularly in this debate between Robin Hanson and Eliezer. Much of the danger arises from the predicted short time period required to get from a mere human-level AI to a superhuman AI+. Eliezer discusses some reasons to expect it to happen quickly here and here. The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
For an analysis of the possibility of a hard takeoff in approaches to AI based loosely on modeling or emulating the human brain, see this posting by Carl Schulman, for example.
If civilisation(t+1) can access resources much better than civilisation(t), then that is just another way of saying things are going fast—one must beware of assuming what one is trying to demonstrate here.
The problem I see with this thinking is the idea that civilisation(t) is a bunch of humans while civilisation(t+1) is a superintelligent machine.
In practice, civilisation(t) is a man-machine symbiosis, while civilisation(t+1) is another man-machine symbiosis with a little bit less man, and a little bit more machine.
There are some promising lines of attack (grounded in decision theory) that might take only a few years of research. We’ll see where they lead. Other open problems in FAI might start looking very solvable if we start making progress on this front.
Show me.
PM’d.
Yes. :)
Yes, but it still has to be explicitly programmed to do that! The question is how to get it to do so. AFAIK shaper-anchor semantics is still quite a ways from being fully specified, but it seems the bigger obstacle is that an AI writer is less likely than not to take the effort to program it that way in the first place.
This is surely the kind of thing that superintelligences will be good at. They will have access to every paperclip picture on the net, every paperclip specification too. They will surely have a much clearer idea about what a paperclip is than humans do. They will know what boxes are too.
I made a stab at it here, and it got some upvotes. So here’s a repost:
Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
(Please let me know if reposting violates LW ettiquette so I know not to do it again.)
I don’t think it violates LW etiquette.
Here’s a sort of fully general counterargument against proposals to naturalize human concepts in AI terms: if you can naturalize human concepts, you should be able to naturalize the human concept of a box. And if you can do that, we can build Oracle AI and save the world. It’s very easy to describe what we mean by ‘stay in the box’, but it turns out that seed (self-modifying!) AIs just don’t have a natural ontology for the descriptions.
This argument might be hella flawed; it seems kind of tenuous.
Aren’t you simply assuming that the world is doomed here? It sure looks like it!
Since when is that assumption part of a valid argument?
That assumption isn’t really a core part of the argument… the general “if specifying human concepts is easy, then come up with a plan for making a seed AI want to stay in a box” argument still stands, even if we don’t actually want to keep arbitrary seed AIs in boxes.
For the record I am significantly less certain than most LW or SIAI singularitarians that seed AIs not explicitly coded with human values in mind will end up creating a horrible future, or at least a more horrible future than something like CEV. I do think it’s worth a whole lot of continued investigation.
I suspect that you mean something like:
If there is an objective universal morality then agents converge on this universal at the limits of intelligence.
And thus perhaps paperclip maximizers have a tendency to become something else.
Since it keeps coming up I think I’ll write a top level post on the subject- I’ll probably do some research when writing so I’ll see what has been written recently. Hopefully I’ll publish in the next week or two.
But, but… paperclips. Its morality is ‘make more flipping paperclips’! Just that. With the right decision theoretic tools for philosophical reasoning it will make even more paper-clips. If that even qualifies as ‘morality’ then that is what a paperclip maximiser has.
Look, I personally don’t believe that all or even most moralities will converge, however… imagine something like the following:
Dear paperclipper,
There’s a limited amount of matter that’s reachable by you in the known universe for any given timespan. Moreover, your efforts to paperclip the universe will be opposed both by humans and other alien civilizations which will perceive them as hostile and dangerous. Even if you’re ultimately victorious, which is far from certain, you’re better off cooperating with humans peacefully, postponing slightly your plans to make paperclips (which you’d have to postpone anyway in order to create weaponry to defeat humanity), and instead working with humans to create a feasible way to construct a new universe where you will hence possess and wherein your desire to create an infinite amount of paperclips will be satisfied without opposition.
Sincerely, humanity.
So, from the intrinsic “I want to create as many paperclips as possible” the truly intelligent AI can reasonably discover the instrumental “I’d like to not be opposed to my creation of such paperclips” to “I’d like to create my paperclips in a way that they won’t harm others, so that they won’t have a reason for me to oppose me” to “I’d like to transport myself to an uninhabited universe of my own creation, to make paperclips without any opposition at all”.
This is probably wishful thinking, but the situation isn’t as simple as what you describe either.
If the paperclipper happens to be the first AI++, and arrives before humanity goes interstellar, then it can probably wipe out all humanity quite quickly without reasoning with it. And if can do that it definitely will—no point in compromising when you’ve got the upper hand.
Well, at least not when the lower hand is more use disassembled to build more cosmic commons burning spore ships.
Agreed that this is probably wishful thinking.
But, yes, also agreed that a sufficiently intelligent and well-informed paperclipper will work out that diplomacy, including consistent lying about its motives, is a good tactic to use for as long as it doesn’t completely overpower its potential enemies.
Wanting to maximise paperclips (obviously?) does not preclude cooperation in order to produce paperclips. We haven’t redefined ‘morality’ to include any game theoretic scenarios in which cooperation is reached, have we? (I suppose we could do something along those lines in the theism thread.)
I’m not sure what is required for a philosophical paper to be deemed “LW-rationalist-vetted”, nor am I sure why that is a desirable feature for a paper to have. But I will state that, IMHO, an approach based on “naturalistic ethics”, like that of Binmore is at least as rational as any ethical approach based on some kind of utilitarianism.
I would say that a naturalistic approach to ethics assumes, with Hume, that fundamental values are not universal—they may certainly vary by species, for example, and also by the historical accidents of genetics, birth-culture, etc. However, meta-ethics is rationally based and universal, and can be converged upon by a process of reflective equilibrium.
As to instrumental values—those turn out to be universal in the sense that (in the limit of perfect rationality and low-cost communication) they will be the same for everyone in the ethical community at a given time. However, they will not be universal in the sense that they will be the same for all conceivable communities in the multiverse. Instrumental values will depend on the makeup of the community, because the common community values are derived as a kind of compromise among the idiosyncratic fundamental values of the community members. Instrumental values will also depend upon the community’s beliefs—regarding expected consequences of actions, expected utilities of outcomes, and even regarding the expected future composition of the community. And, since the community learns (i.e. changes its beliefs), instrumental values must inevitably change a little with time.
As an intuition pump, I’ll claim that Clippy could fit right in to a community of mostly human rationalists, all in agreement on the naturalist meta-ethics. In that community, Clippy would act in accordance with the community’s instrumental values (which will include both the manufacture of paperclips and other, more idiosyncratically human values). Clippy will know that more paper clips are produced by the community than Clippy could produce on his own if he were not a community member. And the community welcomes Clippy, because he contributes to the satisfaction of the fundamental values of other community members—through his command of metallurgy and mechanical engineering, for example.
The aspect of naturalistic ethics which many people find distasteful is that the community will contribute to the satisfaction of your fundamental values only to the extent that you contribute to the satisfaction of the fundamental values of other community members. So, the fundamental values of the weak and powerless tend to get less weight in the collective instrumental value system than do the fundamental values of the strong and powerful. Of course, this does not mean that the very young and the elderly get mistreated—it is rational to contribute now to those who have contributed in the past or who will contribute in the future. And many humans will include concern for the weak among their fundamental values—so the community will have to respect those values.
Is this an argument based on the idea that there is some way for all of math to look such that everyone gets as much of what they want as possible?
There’s goal system zero / God’s utility function / Universal Instrumental Values.
You mean you’re somewhat convinced that there is a universal morality (that even a paperclip maximizer would converge to)? That sounds like a much less tenable position. I mean,
A statement like this needs some support.
I’ve linkified the grandparent a bit—for those not familiar with the ideas.
The main idea is that many agents which are serious about attaining their long term goals will first take control of large quantities of spactime and resources—before they do very much else—to avoid low-utility fates like getting eaten by aliens.
Such goals represent something like an attractor in ethics-space. You could avoid the behaviour associated with the attractor by using discounting, or by adding constraints—at the expense of making the long-term goal less likely to be attained.
Thx for this. I found those links and the idea itself fascinating. Does anyone know if Roko or Hollerith developed the idea much further?
One is reminded of the famous quote from 1984: O’Brien to Winston: “Power is not a means. Power is the end.” But it certainly makes sense, that as an agent becomes better integrated into a coalition or community, and his day-to-day goals become more weighted toward the terminal values of other people and less weighted toward his own terminal values, that an agent might be led to rewrite his own utility function toward Power—instrumental power to achieve any goal makes sense as a synthetic terminal value.
After all, most of our instinctual terminal values—sexual pleasure, food, good health, social status, the joy of victory and the agony of defeat—were originally instrumental values from the standpoint of their ‘author’: natural selection.
Roko combined the conccept with the (rather less sensible) idea of promoting those instrumental values into terminal values—and was met with a chorus of “Unfriendly AI”.
Hollerith produced several pages on the topic.
Probably the best-known continuation is via Omohundro.
“Universal Instrumental Values” is much the same idea as “Basic AI drives” dressed up a little differently:
http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/
http://selfawaresystems.com/2007/10/05/paper-on-the-nature-of-self-improving-artificial-intelligence/
You are right. I hadn’t made that connection. Now I have a little more respect for Omohundro’s work.
I was a little bit concerned about your initial Omohundro reaction.
Omohundro’s material is mostly fine and interesting. It’s a bit of a shame that there isn’t more maths—but it is a difficult area where it is tricky to prove things. Plus, IMO, he has the occasional zany idea that takes your brain to interesting places it didn’t dream of before.
I maintain some Omohundro links here.
As a side point, you could also re-read “Basic AI drives” as “Basic Replicator Drives”—it’s systemic evolution.
Interesting, hadn’t seen Hollerith’s posts before. I came to a similar conclusion about AIXI’s behavior as exemplifying a final attractor in intelligent systems with long planning horizons.
If the horizon is long enough (infinite), the single behavioral attractor is maximizing computational power and applying it towards extensive universal simulation/prediction.
This relates to simulism and the SA, as any superintelligences/gods can thus be expected to create many simulated universes, regardless of their final goal evaluation criteria.
In fact, perhaps the final goal criteria applies to creating new universes with the desired properties.
These sound instrumental; you take control of the universe in order to achieve your terminal goals. That seems slightly different from what Newsome was talking about, which was more a converging of terminal goals on one superterminal goal.
Thus one the proposed titles: “Universal Instrumental Values”.
Newsome didn’t distinguish between instrumental and terminal values.
Those were Newsome’s words.
Ah. I misunderstood the quoting.
Boo!
(To make a point as well-argued as the one it replies to.)
Edit: Now that the above comment was edited to include citations, my joke stopped being funny and got downvoted.
Any universal morality has to have long term fitness—ie it must somehow win at the end of time.
Otherwise, aliens may have a more universal morality.
EDIT: why the downvote?
This does not require as much optimization as it sounds. As Wei Dai points out, computing power is proportional to the square amount of mass obtained as long as that mass can be physically collected together, so a civilization collecting mass probably gets more observers than one spreading out and colonizing mass, depending on the specifics of cosmology. This kind of civilization is much easier to control centrally, so a wide range of values have the potential to dominate, depending on which ones happen to come into being.
I’m not sure where he got the math that available energy is proportional to the square of the mass. Wouldn’t this come from the mass-energy equivalence and thus be mc^2?
Wei Dai’s conjecture about black holes being useful as improved entropy dumps is interesting. Black holes or similar dense entities also maximize speed potential and interconnect efficiency, but they are poor as information storage.
It’s also possible that by the time a civilization reaches this point of development, it figures out how to do something more interesting such as create new physical universes. John Smart has some interesting speculation on that and how singularity civilizations may eventually compete/cooperate.
I still have issues wrapping my head around the time dilation.
Energy is proportional to mass. Computing ability is proportional to (max entropy—current entropy), and max entropy is proportional to the square of mass. That was the whole point of his argument.
Critics will no doubt draw attention to David’s previous venture, zombies.
Sure, we think he’s wrong, but does academia? That the Singularity is supported by more than one side is good news.
Dualism is a minority position:
http://philpapers.org/surveys/results.pl
Mind: physicalism or non-physicalism?
Accept or lean toward: physicalism 526 / 931 (56.4%)
Accept or lean toward: non-physicalism 252 / 931 (27%)
Other 153 / 931 (16.4%)
Philosophers are used to the fact that they have major disagreements with each other. Even if you think zombie arguments fail, as I do, you’ll still perk up your ears when somebody as smart as Chalmers is taking the singularity seriously. I don’t accept his version of property dualism, but The Conscious Mind was not written by a dummy.
I didn’t mean to say that Chalmers isn’t a highly respected philosopher, but I also think it’s true that the impact is somewhat blunted relative to a counterfactual in which his philosophy of mind work was of equal fame and quality, but arguing a different position.
I disagree; the fact that Chalmers is critical of standard varieties of physicalism will make him more credible on the Singularity. In the former case, he rejects the nerd-core view. That makes him a little harder to write off.
From a philosopher’s viewpoint, Chalmers’s work on p-zombies is very respectable. It is exactly the kind of thing that good philosophers do, however mystifying it may seem to a layman.
Nevertheless, to more practical people—particularly those of a materialist, reductionist, monist persuasion, it all looks a little silly. I would say that the question of whether p-zombies are possible is about as important to AI researchers as the question of whether there are non-standard models of set theory is to a working mathematician.
That is, not much. It is a very fundamental and technically difficult matter, but, in the final analysis, the resolution of the question matters a whole lot less than you might have originally thought. Chalmers and Searle may well be right about the possibility of p-zombies, but if they are, it is for narrow technical reasons. And if that has the consequence that you can’t completely rule out dualism, well …, so be it. Whether philosophers can or can not rule something out makes very little difference to me. I’m more interested in whether a model is useful than in whether it has a possibility of being true.
No, I don’t think so. The possibility of p-zombies is very important for FAI, because if zombies are possible it seems likely that an FAI could never tell sentient beings apart from non-sentient ones. And if our values all center around promoting positive experiential states for sentient beings, and we are indifferent to the ‘welfare’ of insentient ones, then a failure to resolve the Hard Problem places a serious constraint on our ability to create a being that can accurately identify the things we value in practice, or on our own ability to determine which AIs or ‘uploaded minds’ are loci of value (i.e., are sentient).
What precisely do you mean by non-standard set theory?. If you mean modifying the axioms of ZFC, then a lot of mathematicians pay attention. There are a lot for example who try to minimize dependence on the axiom of choice. And whether one accepts choice has substantial implications for topology (see for example this survey). Similarly, there are mathematicians who investigate what happens when you assume the continuum hypothesis or a generalized version or some generalized negation.
If one is talking about large cardinal axioms then note that their are results in a variety of fields including combinatorics that can be shown to be true given some strong large cardinal axioms. (I don’t know the details of such results, only their existence).
Finally, if one looks at issues of Foundation or various forms of Anti-Foundation, there’s been work (comparatively recently, primarily in the last 30 years) (see this monograph) and versions of anti-foundation have been useful in logic, machine learning, complex systems, and other fields. While most of the early work was done by Peter Aczel, others have done follow-up work.
What axioms of set theory one is using can be important, and thinking about alternative models of set theory can lead to practical results.
I didn’t say “non-standard set theory”. I said “non-standard models of set theory”.
I originally considered using “non-standard models of arithmetic” as my example of a fundamental, but unimportant question, but rejected it because the question is just too simple. Asking about non-standard models of set theory (models of ZFC, for example) is more comparable to the zombie question precisely because the question itself is less well defined. For example, just what do we mean in talking about a ‘model’ of ZFC, when ZFC or something similar is exactly the raw material used to construct models in other fields?
Oh, I agree that some (many?) mathematicians will read Aczel (I didn’t realize the book was available online. Thx) and Barwise on AFA, and that even amateurs like me sometimes read Nelson, Steele, or Woodin. Just as AI researchers sometimes read Chalmers.
My point is that the zombie question may be interesting to an AI researcher, just as inaccessible cardinals or non-well-founded sets are interesting to an applied mathematician. But they are not particularly useful to most of the people who find them interesting. Most of the applications that Barwise suggests for Aczel’s work can be modeled with just a little more effort in standard ZF or ZFC. And I just can’t imagine that an AI researcher will learn anything from the p-zombie debate which will tell him which features or mechanisms his AI must have so as to avoid the curse of zombiedom.
Learning to distinguish different levels of formalism by training to follow mathematical arguments from formal set theory can help you lots in disentangling conceptual hurdles in decision theory (in its capacity as foundational study of goal-aware AI). It’s not a historical accident I included these kinds of math in my reading list on FAI.
Hmmm. JoshuaZ made a similar point. Even though the subject matter and the math itself may not be directly applicable to the problems we are interested in, the study of that subject matter can be useful by providing exercise in careful and rigorous thinking, analogies, conceptual structures, and ‘tricks’ that may well be applicable to the problems we are interested in.
I can agree with that. At least regarding the topics in mathematical logic we have been discussing. I am less convinced of the usefulness of studying the philosophy of mind. That branch of philosophy still strikes me as just a bunch of guys stumbling around in the dark.
And I agree. The way Eliezer refers to p-zombie arguments is to draw attention to a particular error in reasoning, an important error one should learn to correct.
Asking about non-standard models of ZFC is deeply connected to asking about ZFC with other axioms added. This is connected to the Löwenheim–Skolem theorem and related results. Note for example that if there is some large cardinal axiom L and statement S such that ZFC + L can model ZFC + S, and L is independent of ZFC, then ZFC + S is consistent if ZFC is.
We can make this precise by talking about any given set theory as your ground and then discussing the models in it. This is connected to Paul Cohen’s work in forcing but I don’t know anything about it in any detail. The upshot though is that we can talk about models in helpful ways.
Not much disagreement there, but I think you might underestimate the helpfulness of thinking about different base axioms rather than talking about things in ZFC. In any event, the objection is not to your characterization of thinking about p-zombie but rather the analogy. The central point you are making seems correct to me.
Frankly, I haven’t even bothered looking very much at this material. My attitude is more in line with the philosophy of the Turing test. If it looks like a duck and quacks like a duck...
Hofstadter has a good “zombie takedown”—in “I am a Strange Loop, Chapter 22: A Tango with Zombies and Dualism”.
I think tim’s point was that Chalmers’ work on p-zombies resulted in some untenable conclusions.
More here.
There are a few major problems with any certainty of the singularity. First, we might be too stupid to create a human level ai. Second, it might not possible, for some reason of which we are currently unaware, to create a human level AI. Third, importantly, we could be too smart.
How would that last one work? Maybe we can push technology to the limits ourselves, and no AI can be smart enough to push it further. We don’t even begin to have enough knowledge to know if this is likely. In other words, maybe it will all be perfectly comprehensible to the us as of now, and therefore not a singularity at all.
Is it worth considering? Of course. Is it worth pursuing? Probably, (we need to wait for hindsight to know better than that), particularly since it will matter a great deal if and when it occurs. We simply can’t assume that it will.
Johnicholas made a good comment I think on the point. What we have (and are) doing is very reminiscent of what Chalmers claims will lead to the singularity. I would go so far as to say that we are a singularity of sorts, beyond which the face of the world could never be the same. Our last century especially, as we went from what would, by analogy, be from the iron age to the beginning of the renaissance, or even further. Cars, Relativity, Quantum Mechanics, planes, radar, microwaves,two world wars, nukes, collapse of colonial system, interstates, computers, massive cold war, countless conflicts and atrocities, entry to and study of space, the internet, and that is just a brief survey, off the top of my head. We’ve had so many, that I’m not sure superhuman AI would be all that difficult to accept, so long as it was super morally speaking as well -which is, of course, not a given.
Any true AI that could not, with 100% accuracy be called friendly, should not exist.