How would converging to a “universal morality” help produce paperclips?
If you’re a seed AI and you’re somewhat confused about what your creator meant when they said “valueOfCreatingPaperclips=infinity”, so you do what you think they were trying to get you to do, which was to create economic value by making paperclips, and the reason they wanted to do that was to make a profit for themselves, and the reason for that is they’re part of this larger system called humanity which is following this strange vector in preferencespace… Like I said, this doesn’t apply to AIs that are bad at that kind of philosophical reflection, but I’m not sure how likely it is that human extrapolated volition and babyeater extrapolated volition look at all different if you just got the extrapolator working right. I am now going to duck out of this conversation because saying unconventional things on LW means you have to be really careful about your phrasing, and I don’t have the necessary mental energy nor desire. I mostly just hope that someone out there is going to fill in the gaps of what I’m saying and therefore get to play with the ideas I’m trying to convey.
If you’re a seed AI and you’re somewhat confused about what your creator meant when they said “valueOfCreatingPaperclips=infinity”, so you do what you think they were trying to get you to do, which was to create economic value by making paperclips, and the reason they wanted to do that was to make a profit for themselves, and the reason for that is they’re part of this larger system called humanity which is following this strange vector in preferencespace...
And the reason you value friendship is that “evolution” “made” it so, following the Big Bang. Informal descriptions of physical causes and effects don’t translate into moral arguments, and there’s no a priori reason to care about what other “agents” present in your causal past (light cone!) “cared” about, no more than caring about what they “hated”, or even to consider such a concept.
(I become more and more convinced that you do have a serious problem with the virtue of narrowness, better stop the meta-contrarian nonsense and work on that.)
You’re responding to an interpretation of what I said that assumes I’m stupid, not the thing I was actually trying to say. Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments? I’m not retarded. I just don’t have the energy to think through all the ways that people could interpret what I’m saying as something dumb because it pattern matches to things dumb people say. I’m going to start disclaiming this at the top of every comment, as suggested by Steve Rayhawk.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values. I’m not saying your average AI programmer can make an AI that does this, though I am suggesting it is plausible.
This is stupid. I’m suggesting a hypothesis with low probability that is contrary to standard opinion. If you want to dismiss it via absurdity heuristic go ahead, but that doesn’t mean that there aren’t other people who might actually think about what I might mean while assuming that I’ve actually thought about the things I’m trying to say. This same annoying thing happened with Jef Allbright, who had interesting things to say but no one had the ontology to understand him so they just assumed he was speaking nonsense. Including Eliezer. LW inherited Eliezer’s weakness in this regard, though admittedly the strength of narrowness and precision was probably bolstered in its absence.
If what I am saying sounds mysterious, that is a fact about your unwillingness to be charitable as much as it is about my unwillingness to be precise. (And if you disagree with that, see it as an example.) That we are both apparently unwilling doesn’t mean that either of us is stupid. It just means that we are not each others’ intended audience.
You’re responding to an interpretation of what I said that assumes I’m stupid [...] I’m not retarded.
No one said you were stupid.
Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments?
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
As a parable, imagine an engineer proposing a design for a perpetual motion device. An onlooker objects: “But what about conservation of energy?” The engineer says: “Do you seriously think I spent four years at University without understanding such basic arguments?” An uncharitable onlooker might say “Yes.” A better answer, I think, is: “Your personal credentials are not at issue, but the objection to your design remains.”
I suppose I mostly meant ‘irrational’, not stupid. I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera. If I didn’t understand these arguments after having spent so much time looking at them… I may not be stupid, but there’d definitely be some kind of gross cognitive impairment going on in software if not in hardware.
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
There were a few cues where I acknowledged that I agreed with the standard argument (AGI won’t automatically converge to Eliezer’s “good”), but was interested in a different argument about philosophically-sound AIs that didn’t necessarily even look at humanity as a source of value but still managed to converge to Eliezer’s good, because extrapolated volitions for all evolved agents cohere. (I realize that your intuition is interestingly perhaps somewhat opposite mine here, in that you fear more than I do that there won’t be much coherence even among human values. I think that we might just be looking at different stages of extrapolation… if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting). If this idea is too vague or you’re interested I could expand on this elsewhere.)
Your parable makes sense, it’s just that I don’t think I was proposing a perpetual motion device, just something that could sound like a perpetual motion device if I’m not clear enough in my exposition, which it looks like I wasn’t. I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting).
What does time discounting have to do with coherence? Of course exponential discounting is “universal” in the sense that if you’re going to time-discount at all (and I don’t think we should), you need to use an exponential in order to avoid preference reversals. But this doesn’t tell us anything about what exponential discounters are optimizing for.
If this idea is too vague or you’re interested I could expand on this elsewhere. [...] I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
I think your comments would be better received if you just directly talked about your ideas and reasoning, rather than first mentioning your shocking conclusions (“theism might be correct,” “volitions of evolved agents cohere”) while disclaiming that it’s not how it looks. If you make a good argument that just so happens to result in a shocking conclusion, then great, but make sure the focus is on the reasons rather than the conclusion.
AGI won’t automatically converge to Eliezer’s “good”
vs.
extrapolated volitions for all evolved agents cohere.
It really really seems like these two statements contradict each other; I think this is the source of the confusion. Can you go into more detail about the second statement?
In particular, why would two agents which both evolved but under two different fitness functions be expected to have the same volition?
I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera.
“Basic SIAI arguments like “value is fragile”″ …? You mean this...?
The post starts out with:
If I had to pick a single statement that relies on more Overcoming Bias content I’ve written than any other, that statement would be:
Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
...it says it isn’t basic—and it also seems pretty bizarre.
For instance, what about the martians? I think they would find worth in a martian future.
For instance, what about the martians? I think they would find worth in a martian future.
Yeah, and paperclippers would find worth in a future full of paperclips, and pebblesorters would find worth in a future full of prime-numbered heaps of pebbles. Fuck ’em.
If the martians are persons and they are doing anything interesting with their civilization, or even if they’re just not harming us, then we’ll keep them around. “Human values” doesn’t mean “valuing only humans”. Humans are capable of valuing all sorts of non-human things.
Suppose someone who reliably does not generate common obviously wrong ideas/arguments has an uncommon idea that is wrong in a way that is non-obvious, but that you could explain if the wrong idea itself were precisely explained to you. But this person does not precisely explain their idea, but instead vaguely points to it with a description that sounds very much like a common obviously wrong idea. So you try to apply charity and fill in the gaps to figure out what they are really saying, but even if you do find the idea that they had in mind, you wouldn’t identify as such, because you see how that idea is wrong, and being charitable, you can’t interpret what they said in that way. How could you figure out what this person is talking about?
How could you figure out what this person is talking about?
You’d have to be speaking their language in the first place. That’s why I wrote about intended audiences. But it seems that at my current level of vagueness my intended audience doesn’t exist. I’ll either have to get more precise or stop posting stuff that appears to be nonsense.
You’d have to be speaking their language in the first place. That’s why I wrote about intended audiences. But it seems that at my current level of vagueness my intended audience doesn’t exist.
Sometimes it is impossible to reach an intended audience when the not-intended audience is using you as a punching bag to impress their intended audience. Most of debate in conventional practice is, after all, about trying to spin what the other person says to make them look bad. If your ‘intended audience’ then chose to engage with you at the level you were hoping to converse they risk being collaterally damaged in the social bombardment.
For my part I reached the conclusion that you are probably using a different conception of ‘morality’, analogous to the slightly different conception of ‘theism’ from your recent thread. This is dangerous because in group signalling incentives are such that people will be predictably inclined to ignore the novelty in your thoughts and target the nearest known stupid thing to what you say. And you must admit: you made it easy for them this time!
It may be worth reconsidering the point you are trying to discuss a little more carefully, and perhaps avoiding the use of the term ‘morality’. You could then make a post on the subject such that some people can understand your intended meaning and have useful conversation without risking losing face. It will not work with everyone, there are certain people you will just have to ignore. But you should get some useful discussion out of it. I note, for example, that while your ‘theism’ discussion got early downvotes by the most (to put it politely) passionate voters it ended up creeping up to positive.
As for guessing what sane things you may be trying to talk about I basically reached the conclusion “Either what you are getting at boils down to the outcome of acausal trade or it is stupid”. And acausal trade is something that I can not claim to be certain about.
I just don’t have the energy to think through all the ways that people could interpret what I’m saying as something dumb because it pattern matches to things dumb people say. I’m going to start disclaiming this at the top of every comment, as suggested by Steve Rayhawk.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values. I’m not saying your average AI programmer can make an AI that does this, though I am suggesting it is plausible.
As a “peace offering”, I’ll describe a somewhat similar argument, although it stands as an open confusion, not so much a hypothesis.
Consider a human “prototype agent”, additional data that you plug in into a proto-FAI (already a human-specific thingie, or course) to refer to precise human decision problem. Where does this human end, where are its boundaries? Why would its body be the cutoff point, why not include all of its causal past, all the way back to the Big Bang? At which point, talking about the human in particular seems to become useless, after all it’s a tiny portion of all that data. But clearly we need to somehow point to the human decision problem, to distinguish it from frog decision problem and the like, even though such boundless prototype agents share all of their data. Do you point to human finger and specify this actuator as the locus through which the universe is to be interpreted, as opposed to pointing to a frog’s leg? Possibly, but it’ll take a better understanding of interpreting decision problems from arbitrary agents’ definitions to make progress on questions like this.
You’re responding to an interpretation of what I said that assumes I’m stupid, not the thing I was actually trying to say. Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments?
With each comment like this you make, and lack of comments that show clear understanding, I think that more and more confidently, yes. Disclaimers don’t help in such cases. You don’t have to be stupid, you clearly aren’t, but you seem to be using your intelligence to confuse yourself by lumping everything together instead of carefully examining distinct issues. Even if you actually understand something, adding a lot of noise over this understanding makes the overall model much less accurate.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values.
One thing they rather obviously might converge on is the “goal system zero” / “Universal Instrumental Values” thing. The other main candidates seem to be “fitness” and “pleasure”. These might well preserve humans for a while—in historical exhibits.
there’s no a priori reason to care about what other “agents” present in your causal past (light cone!) “cared” about
Nor is there an a priori reason for an AI to exist, for it to understand what ‘paperclips’ are, let alone for it to self-improve through learning like a human child does, absorb human languages, and upgrade itself to the extent necessary to take over the world.
I suspect that any team of scientists or engineers with the knowledge and capability required to build an AGI with at least human-infant level cognitive capacity and the ability to learn human language will understand that making the AI’s goal system dynamic is not only advantageous, but is necessitated in practice by the cognitive capabilities required for understanding human language.
The idea of a paperclip maximizer taking over the world is a mostly harmless absurdity, but it also detracts from serious discussion.
If you’re a seed AI and you’re somewhat confused about what your creator meant when they said “valueOfCreatingPaperclips=infinity”, so you do what you think they were trying to get you to do, which was to create economic value by making paperclips...
But that sounds like we’re programming the AI in English. I can’t see an AI with a motivational system well-defined enough to work at all getting confused in that way; would “Do what my creator intended me to do, if I can’t figure out what else to do” even show up as a motivational drive if it is not explicitly coded in?
But that sounds like we’re programming the AI in English
There’s reason to suspect that any human-level AI must be programmed in human languages.
In fact, that’s almost tautological by virtue of the Turing Test.
Another way to look at it: we developed simplified formal computer languages to program the tiny simple circuits we could build at the time, but the goal for AGI has always been to develop a system you could directly program in the full complexity of human languages.
Think about how the software industry works—high level business goals in English, translated into more technical english for system engineers and designers, translated down into the much simpler verbose programming languages such as C++, then machine translated to the even simpler assembly the CPU can actually understand.
Of the concepts named for Alan Turing it is “Turing Completeness” that is far more interesting and important than the chatbot test. If you think on the concept of a Turing complete computation system you will perhaps realise why the rest of us would consider your claim extremely silly. Well, one of the reasons anyway.
There’s reason to suspect that any human-level AI must be programmed in human languages.
In fact, that’s almost tautological by virtue of the Turing Test.
What?
Do you mean humanlike AIs? An AI capable of passing the Turing Test would of course need to understand human language well enough to act convincingly human (or at least do a really good imitation), but that’s not necessarily a human-level AI (convincing people that you’re human is a separate task from actually being human, probably a much easier one), and human-level AIs in general needn’t necessarily understand human language any better than any other sort of language by default.
Anyway, an AI being “programmed in human languages” seems to be going by the “programming = instructions being given to a human servant” metaphor, and if you want that to work, you clearly first need to write the servant in something other than human language. And copying human psychology well enough that the AI actually understands human language as well as a human does, rather than being able to imitate understanding well enough to carry on a text-based conversation, is no easy task, and is probably a lot harder than manually coding a simple goal system like paperclip maximization in a lower-level language. But that could still be an AGI.
Human level AI—an AGI design capable of matching the full intellectual capabilities of the best human scientists/engineers.
To get to H level in a practical timeframe, a human AI will have to learn human knowledge, it will have to experience an equivalent to a standard 20-25 year education.
Learning human knowledge in practice requires learning human language as an early initial precursor step.
The software of a human mind—the memeset or belief network, is essentially a complex human language program.
For an AI to achieve human-level, it will have to actually understand human language as well as a human does, and this requires a bunch of algorithmic complexity from the human brain at the hardware level and it implies the capability to parse and run human language programs.
So you only need to program the infant brain in a programming language—the rest can be programmed in human language.
is probably a lot harder than manually coding a simple goal system like paperclip maximization in a lower-level language. But that could still be an AGI
If it doesn’t have the capacity to understand human level language then it’s not an AGI—as that is the defining characteristic of the concept (by my/Turing’s definition).
And thus by extension, the defining characteristic of a human-mind is human language capability.
EDIT: Why are you downvoting? Don’t agree and don’t want to comment?
If it doesn’t have the capacity to understand human level language then it’s not an AGI—as that is the defining characteristic of the concept (by my/Turing’s definition).
Turing never intended his test to be adopted as “the defining characteristic of the concept [of AGI]” in anything like this fashion. Human ‘level’ language is also somewhat misleading in as much as it implies it is reaching a level of communication power rather than adapting specifically to the kind of communications humans happen to have evolved—especially the quirks and weaknesses.
Turing never intended his test to be adopted as “the defining characteristic of the concept [of AGI]” in anything like this fashion.
I disagree somewhat. It’s difficult to know exactly what “he intended”, but the opening of his paper which introduces the concept, starts with “Can machines think?”, and describes a reasonable language based test: an intelligent machine is one that can convince us of it’s intelligence in plain human language.
Human ‘level’ language is also somewhat misleading in as much as it implies it is reaching a level of communication power rather than adapting specifically to the kind of communications humans happen to have evolved—especially the quirks and weaknesses
I meant natural language, the understanding of which certainly does require a certain minimum level of cognitive capabilities.
We have a much greater understanding of what the “think” in “Can machines think?” means now. We have better tests than seeing if they can fake human language.
The test isn’t about faking human language, it’s about using language to probe another mind. Whales and elephants have brains built out of similar quantities of the same cortical circuits but without a common language stepping into their minds is very difficult.
But how do you describe the task and how does the AI learn about it? There’s a massive gulf between AI’s which can have the task/game described in human language and those that can not. Whale brains and elephants fall in the latter category. An AI which can realistically self-improve to human levels needs to be in the former category, like a human child.
You could define intelligence with an AIQ concept so abstract that it captures only learning from scratch without absorbing human knowledge, but that would be a different concept—it wouldn’t represent practical capacity to intellectually self-improve in our world.
But how do you describe the task and how does the AI learn about it?
Use something like Prolog to declare the environment and problem. If I knew how the AI would learn about it, I could build an AI already. And indeed, there are fields of machine learning for things such as Bayesian inference.
Describe the problem of learning how to become a computer scientist or quantum physicist, then let it solve that problem. Now it can learn to become a computer scientists or quantum physicist.
(That said, a better method would be to describe computer science and quantum physics and just let it solve those fields.)
Agreement that human children are more intelligent than whales or elephants is likely to be the closest we get to agreement on this subject. You would need to absorb a lot of new knowledge from all the replies from various sources that have been provided to you here already before in progress is possible.
Unfortunately it seems we are not even fully in agreement about that. A turing style test is a test of knowledge, the AIQ style test is a test of abstract intelligence.
An AIQ type test which just measures abstract intelligence fails to differentiate between feral einstein and educated einstein.
Effective intelligence, perhaps call it wisdom, is some product of intelligence and knowledge. The difference between human minds and those of elephants or whales is that of knowledge.
My core point, to reiterate again: the defining characteristic of human minds is knowledge, not raw intelligence.
Intelligence can produce knowledge from the environment. Feral Einstein would develop knowledge of the world, to the extent that he wasn’t limited by non-knowledge/intelligence factors (like finding shelter or feeding himself).
Very probably not. I’m claiming that the desire to code it in would be convergent, ’cuz it’s the best way to do AI even if you think you’re just trying to maximize paperclips. Of course, most AGI researchers aren’t that clever, so again, we still need to raise awareness about AGI dangers. I’m just floating a contrarian hypothesis that seems somewhat neglected.
I’m claiming that the desire to code it in would be convergent, ’cuz it’s the best way to do AI even if you think you’re just trying to maximize paperclips.
I think I see what you’re saying; just as we reflect on our desires and try to understand how they tick and where, biologically and historically and culturally, they come from, so also might any AI.
However, the thing about it is: that doesn’t actually change those values. For example (and despite the dire warnings of some creationists), despite the fact that we now understand that our value system is a consequence of an evolutionary algorithm, we haven’t actually started valuing evolutionary goals over our own built-in goals. For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
Similarly, a paperclip-maximizer might well be interested in figuring out why its utility function is what it is, so that it may better understand the world it lives in… but that’s not going to change its overriding and primary interest in making paperclips.
For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
It seems as though that sometimes triggers intimate pair-bonding activities while reducing your exposure to STDs. Use of condoms is often not remotely silly from that perspective—IMHO.
The example still works since there are quite a few couples who use condoms because they just don’t want to have kids. They don’t have any worry about STDs from their partner. If you insist on a clear cut case look at men who get vasectomies.
The idea that use of contraception is “silly” from the perspective of gene propagation seem just wrong to me. There are plenty of cases where it would make sense for those who want to spread their genes around to agree to use contraceptives. Contraceptive use makes sense sometimes, and not others.
It could be claimed that the average effect of contraception on genes is negative—but that seems to be a whole different thesis.
Sure. Surely we are not disagreeing here. The original comment was:
For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
My position is just that contraception has a perfectly reasonably place for gene propogators. The idea that contraception is always opposed to your genetic interests is wrong. Lack of contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
I’m not sure if we are. The fact that contraception might have a reasonable place for gene propagators is not the issue. The point is that much, and possibly the vast majority, of contraceptive use is contrary to the goals of gene propagation.
No contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
Not really. Remember, evolution doesn’t care about your happiness. Indeed, regarding the example you linked to, from an evolutionary perspective,a one night stand with all the protection is utterly useless. It is very likely in that male’s evolutionary advantage to not use condoms.
And even if you don’t agree with the condom example the other example, of a people engaging in a generally irreversible or difficult to reverse operation which renders them close to sterile is pretty clearly against the interest of gene propagation.
Humans evolved in a context where we didn’t have easy contraception and the best humans could do to prevent contraception was things like coitus interruptus. It shouldn’t surprise you that evolution has not made human instincts catch up with modern technologies.
One might think that from an evolutionary perspective it makes sense to substantially delay or reduce offspring number so as to invest maximum resources in a small number of offspring. But humans in the developed world now reside in a situation with low disease rates and lots of resources, so that strategy is sub-optimal from an evolutionary perspective. Look at how charedi(ultra-orthodox) Jews and the Amish are two of the fastest growing populations in the United States.
The fact that contraception might have a reasonable place for gene propagators is not the issue. The point is that much, and possibly the vast majority, of contraceptive use is contrary to the goals of gene propagation.
I can see what you think the issue is. What I don’t see is where in the context you are getting that impression from.
No contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
Not really. Remember, evolution doesn’t care about your happiness. Indeed, regarding the example you linked to, from an evolutionary perspective,a one night stand with all the protection is utterly useless. It is very likely in that male’s evolutionary advantage to not use condoms.
Your example is stacked to favour your conclusion. What you need to try and do in order to understand my position is to think about an example that favours my conclusion.
So: get rid of the one-night stand, and imagine that the girl is desirable—that having safe sex with her looks like the best way to initiate a pair-bonding process leading to the two of you having some babies together—and that the alternative is rejection, and her walking off and telling her friends what a jerk you are when it comes to protecting your girl.
For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
I think that’s a retcon. People use contraception so they can have more sex than they would if they had to worry about having kids every time. They may or may not rationalise further, I suspect that generally they don’t.
A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
In terms of genetic success, having more kids than you can keep track of is pretty much the ideal, as long as all or at least most survive to reproductive adulthood.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
But some people consciously choose never to have any kids. That’s silly from the perspective of gene propagation if anything is.
I think I see what you’re saying; just as we reflect on our desires and try to understand how they tick and where, biologically and historically and culturally, they come from, so also might any AI.
However, the thing about it is: that doesn’t actually change those values.
Sure it does. A devout priest spends half his life celibate and serving God. One day he has a crisis, reads a bunch of stuff on the internet and suddenly realizes he doesn’t believe in God. His values change.
despite the fact that we now understand that our value system is a consequence of an evolutionary algorithm, we haven’t actually started valuing evolutionary goals over our own built-in goals.
Even this is questionable. I suspect any concept of universal morality must be evolutionary. This certainly is a widespread concept in systems/transhumanist/singularitan/cosmist thought. We do value evolution in and of itself.
but that’s not going to change its overriding and primary interest in making paperclips.
It’s probably possible in principle to build such an AI—it would probably need some sort of immutable hard-coded paperclip recognition module which it could evaluate potential simulated futures generated from the more complex general intelligence system.
If such a thing developed to a human level or beyond and could reflect on it’s cognition, it may explain in lucid detail how futures filled with paperclips were good and others were evil.
It could even understand that it’s concepts of morality and good/bad were radically different than those of humans, and it would even understand that this difference relates to it’s hard coded paper-clip recognizer, and it would explain in detail how this architecture was superior to human value systems .. because it helped to maximize expected future paperclips.
It could even write books such as “Paperclip Morality: the Truth”.
But just because such a thing is possible in principle doesn’t make it the slightest bit likely.
If you can build an AGI that can understand human language, it would be much easier and considerably more effective to make the AGI’s goal system dynamically modifiable on reflection through human language.
Instead of having a special hard-coded circuit to evaluate the utility of potential futures, you could just have the general conceptual circuitry handle this. The concept of ‘good’ would still be somewhat special in it’s role in the goal system itself, but the ‘goodness recognizer’ could change and evolve over time.
Sure it does. A devout priest spends half his life celibate and serving God. One day he has a crisis, reads a bunch of stuff on the internet and suddenly realizes he doesn’t believe in God. His values change.
Well, the counter-argument to that particular example would be that the priest’s belief in God wasn’t a terminal value; rather, their goals of being happy and helping other people and understanding the universe were. Believing in and obeying God were just instrumental values.
However, agreed that there’s nothing in particular forcing people, weird and funky and clunky as our minds are, from always having the same fixed terminal values either. To pick an extreme example, peoples’ brains can sometimes be messed up severely by hormonal imbalances, which can in turn cause people to do such drastically anti-own-terminal-value things as committing suicide.
I should’ve been more specific and just said that, in general, understanding evolutionary psychology never or only very rarely causes peoples’ terminal values to change.
I suspect any concept of universal morality must be evolutionary. [...] We do value evolution in and of itself.
Human morality is a product of evolution; however, our morality is not itself an evolutionary algorithm execution mechanism. It’s kind of a vague approximation of one (in that all the moralities that sucked for fitness were selected against), but it still often leads to drastically different results than a straight-up evolutionary fitness maximization algorithm with access to our brains’ resources would.
For example: I intend never to have biological children and consider this decision to be a moral one. However, from an evolutionary perspective, deliberately preventing my own genes from propagating is just plain silly.
But just because such a thing is possible in principle doesn’t make it the slightest bit likely.
Yes, the paper-clip maximizer is just a whimsical example. However, similarly Really Unfriendly optimizers are quite plausible. Imagine the horrors that could result from a naive human-happiness-maximizer hitting the singularity asymptote.
If you can build an AGI that can understand human language, it would be much easier and considerably more effective to make the AGI’s goal system dynamically modifiable on reflection through human language.
Yes, that would be important, but it still wouldn’t be enough to solve the problem; in fact, the really hard part of the problem still remains! The happiness-maximizer might base its understanding of happiness on descriptive human usage of the word, and end up with a truly thorough and consistent understanding of the word… and then still turn everybody into nearly mindless wireheads.
Our morality engines and our language aren’t properly tuned for dealing with the kind of reality-bending power a superintelligent entity would have.
I should’ve been more specific and just said that, in general, understanding evolutionary psychology never or only very rarely causes peoples’ terminal values to change.
You may not have liked that particular example, but I think you are in agreement that terminal values change.
Just to make sure though, a few more examples:
someone who likes chocolate ice cream and then some years later prefers vanilla instead
someone who likes impressionist art but then years later prefers post-modern
someone who likes cats more than dogs
someone who likes chinese culture more than ethiopian
However, similarly Really Unfriendly optimizers are quite plausible. Imagine the horrors that could result from a naive human-happiness-maximizer hitting the singularity asymptote.
I don’t find such maximizers significantly plausible at even the human level intelligence. Possible in principle? Sure. But if you look at realistic, plausible routes to AGI it becomes clear that an AGI necessarily will be programmed in human languages and will pick up human cultural programs.
And finally, even if it was plausible that a flawed design could hit the singularity asymptote, that itself might only be a big problem if it had a short planning horizon.
It seems that all superintelligences with infinite planning horizons become behaviorally indistinguishable. All long-term value systems converge on a single universal attractor—they become cosmists.
That is what I mean when I said “any concept of universal morality must be evolutionary”.
The happiness-maximizer might base its understanding of happiness on descriptive human usage of the word, and end up with a truly thorough and consistent understanding of the word… and then still turn everybody into nearly mindless wireheads.
That would again be assuming humans capable of building a superhuman AGI but asinine enough to attempt to somehow hardcode it’s goal system, instead of making it open-ended dynamic as a human’s.
How would you build a happiness maximizer and fix the value of happiness? The meaning of a word in a human brain is stored as huge set of associate weights that anchor it in a massive distributed belief network. The exact meaning of each word changes over time as the network learns and reconfigures itself—no concept is quite static. So for an AGI to understand the word in the same way we do, the word’s meaning is always subject to some drift. And this is a good thing.
someone who likes chocolate ice cream and then some years later prefers vanilla instead
someone who likes impressionist art but then years later prefers post-modern
someone who likes cats more than dogs
someone who likes chinese culture more than ethiopian
I think we may be experiencing some terminology confusion here. Just to be clear, you realize that these are all not terminal values, right?
That [example of a happines-maximizer turning everyone into wireheads] would again be assuming humans capable of building a superhuman AGI but asinine enough to attempt to somehow hardcode it’s goal system, instead of making it open-ended dynamic as a human’s.
Here’s the big issue: if it’s open-ended, how do we keep it from drifting off somewhere terrible? The system that guides that seems to be the largest potential risk point of the approach you describe.
It seems that all superintelligences with infinite planning horizons become behaviorally indistinguishable. All long-term value systems converge on a single universal attractor—they become cosmists.
I’m very confused by this; can you go into more detail about why you think this is so? In particular, why would it be true for all long-term value systems (including flawed and simplistic value systems), and not just a very small subset?
I think we may be experiencing some terminology confusion here. Just to be clear, you realize that these are all not terminal values, right?
No. What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens? I’m not sure I buy into the concept.
The point of value or preferences from the perspective of intelligence is to rate potential futures.
Here’s the big issue: if it’s open-ended, how do we keep it from drifting off somewhere terrible?
We are open-ended! Our future-preferences depend on and our intertwined with our knowledge. So any superintelligence or evolutionary accelerator we create will also need to be open-ended, or it wouldn’t be protecting our dynamic core.
In particular, why would it be true for all long-term value systems (including flawed and simplistic value systems), and not just a very small subset?
I discussed some of this in my first, somewhat hasty, LW post here. A few others here have mentioned a similar idea, I may write more about it as I find it interesting.
Basically, if your planning horizon extends to infinity you will devote all of your resources towards expanding your net intelligence for the long term future, regardless of what your long term goals are.
So no matter whether your long term goal is to maximize paper-clips, human happiness or something more abstract, in each case this leads to an identical outcome for the foreseeable future: a local computational singularity with an exponentially expanding simulated metaverse.
There is some speculation within physics that black hole like singularities can create new physical universes through inflation. If this is true than the long term goals of a superintelligence are best served by literally creating new physical multiverses that have more of the desirable space-time properties.
What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens?
No, what I’m referring to is also known as an intrinsic value. It’s a value that is valuable in and of itself, not in justification for some other value. A non-terminal value is commonly referred to as an instrumental value.
For example, I value riding roller-coasters, and I also value playing Dance Dance Revolution. However, those values are expressible in terms of another, deeper value, the value I place on having fun. That value may in turn be thought of as an instrumental value of a yet deeper value: the value I place on being happy moment-to-moment.
If you were going to implement your own preference function as a Turing machine, trying to keep the code as short as possible, the terminal values would be the things that machine would value.
So no matter whether your long term goal is to maximize paper-clips, human happiness or something more abstract, in each case this leads to an identical outcome for the foreseeable future: a local computational singularity with an exponentially expanding simulated metaverse.
Okay, I see where you’re coming from. However, from a human perspective, that’s still a pretty large potential target range, and a large proportion of it is undesirable.
What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens?
the value I place on having fun . . .may in turn be thought of as an instrumental value of a yet deeper value: the value I place on being happy moment-to-moment.
From the deeper perspective of computational neuroscience, the intrinsic/instrumental values reduce to cached predictions of your proposed ‘terminal value’ (being happy moment-to-moment), which reduces to various types of stimulations of the planning reward circuitry.
Labeling the experience of chocolate ice cream as an ‘instrumental value’ and the resulting moment-to-moment happiness as the real ‘terminal value’ is a useless distinction—it then collapses your terminal values down to the singular of ‘happiness’ and relabels everything worthy of discussion as ‘instrumental’.
The quality of being happy moment-to-moment is anything but a single value and should not by any means be reduced to a single concept. It is a vast space of possible mental stimuli, each of which creates a unique conscious experience.
The set of mental states encompassed by “being happy moment-to-moment moment-to-moment” is vast: the gustatory pleasure of eating chocolate ice cream, the feeling of smooth silk sheets, the release of orgasm, the satisfaction of winning a game of chess, the accomplishment of completing a project, the visual experience of watching a film, the euphoria of eureka, all of these describe entire complex spaces of possible mental states.
Furthemore, the set of possible mental states is forever dynamic, incomplete, and undefined. The set of possible worlds that could lead to different visual experiences, as just a starter example, is infinite, and each new experience or piece of knowledge itself changes the circuitry underlying the experiences and thus changes our values.
If you were going to implement your own preference function as a Turing machine, trying to keep the code as short as possible, the terminal values would be the things that machine would value.
The simplest complete turing machine implementation of your preference function is an emulation of your mind. It is you, and it has no perfect simpler equivalent (although many imperfect simulations are possible).
However, from a human perspective, that’s [computational singularity] still a pretty large potential target range, and a large proportion of it is undesirable
The core of the cosmist idea is that for any possible goal evaluator with an infinite planning horizon, there is a single convergent optimal path towards that goal system. So no, the potential target range in theory is not large at all—it is singularly narrow.
As an example, consider a model universe consisting of a modified game of chess or go. The winner of the game is then free to arrange the pieces on the board in any particular fashion (including the previously dead pieces). The AI’s entire goal is to make some particular board arrangement - perhaps a smily face. For any such possible goal system, all AI’s play the game exactly the same at the limits of intelligence—they just play optimally. Their behaviour doesn’t differ in the slightest until the game is done and they have won.
Whether the sequence of winning moves such a god would make on our board is undesirable or not from our current perspective is a much more important, and complex, question.
Similarly, a paperclip-maximizer might well be interested in figuring out why its utility function is what it is, so that it may better understand the world it lives in… but that’s not going to change its overriding interest in making paperclips over all else.
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it’s really damn hard to precisely specify a paperclip. (There are things that are easier to specify that this argument doesn’t apply to and that are more plausibly dangerous, like hyperintelligent theorem provers...) Thus in trying to figure out what it’s utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) ‘maximize paperclips’ is because ‘maximize paperclips’ was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain. This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly. Admittedly, most uFAIs probably won’t be that sophisticated, and so worrying about AI-related existential risks is still definitely a big deal. We just might want to be a little more cognizant of potential motivations for people who disagree with what has recently been dubbed SIAI’s ‘scary idea’.
Thus in trying to figure out what it’s utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) ‘maximize paperclips’ is because ‘maximize paperclips’ was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain.
Hm. I suppose that’s possible, though it would require that the AI be given a utility function that’s specifically meant to be amenable to that kind of revision.
Under the most straightforward (i.e. not CEV-style) utility function design, fuzziness in its definition of “paperclip” would just drive the paperclip-maximizer to choose the possible definition that yields the highest utility score.
To pick a different silly example, a dog-maximizer with a utility function based on the number of dogs in the universe would simply prefer to tile the solar system with tiny Chihuahas rather than Great Danes; the whole range of “dog” definitions fit the function, so it just chooses the one that is most convenient for maximum utility. It wouldn’t try to resolve it by trying to decide which definition is more in line with the designer’s ideals, unless “consider the designer’s ideals” were designed into the system from the start.
Currently expected to be difficult, since we don’t know of an easy way to do so. That it’ll turn out to be easy (in the hindsight) is not totally out of the question.
Is designing “consider the designer’s ideals” in an AI difficult?
Currently expected to be difficult, since we don’t know of an easy way to do so.
Has anyone considered approaching this problem in the same way we might approach “read the user’s handwriting”? That is, the task is not one we program the AI to accomplish—instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases.
Mirrors and Paintings (yes, you want to point your program at the world and have it figure out what you referred to), The Hidden Complexity of Wishes (if you need to answer AI’s question or give it instructions, you’re doing something wrong and it won’t work).
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
But then we get down to doing the comparison. The AI informs me that what I really want is to kill my father and sleep with my mother. I deny this. Do we take this as evidence that the AI really does know me better than I know myself, or as a symptom of a bug?
I would argue that if you don’t need to answer the AI’s questions or give it instructions, you’re doing something wrong and it won’t work. By definition. At least for the first ten thousand scans or so. And even then there will remain questions on which the AI and introspection would deliver different answers. Questions with hidden complexity. I just don’t see how anyone would trust a CEV extrapolated from brain scans until we had decades of experience suggesting that scanning and modeling yields better results than introspection.
I would argue that if you don’t need to answer the AI’s questions or give it instructions, you’re doing something wrong and it won’t work. By definition.
Agreed. And any useful AI will have to understand human language to do or learn much anything of value.
The detailed analysis of full brain scanning tech I’ve seen puts it far into the future, well beyond human-level AGI.
And even then there will remain questions on which the AI and introspection would deliver different answers.
You have to make sure AI predictably gives a better answer even on questions where you disagree. And there will be questions which can’t even be asked of a human.
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer’s suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven’t you just asked me to assume that there are no differences?
Sorry, I simply don’t understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
The upload is not the AI (and Eliezer’s post doesn’t refer to uploads IIRC, but for the sake of the argument assume they are available as raw material). You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
Did you notice, in my preamble, that I mentioned software testing?
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
But this is not a philosophical argument.
To recap:
I suggested that an AI which is a precursor to the FAI should come to understand human values by interacting (over an extended ‘training’ period) with actual humans—asking them questions about their values and perhaps performing some experiments as in a psych or game theory laboratory.
You responded by linking to this, which as I read it suggests that the most accurate and efficient way to extract the values of a human test subject would be by carrying out a non-destructive brain scan. Quoting the posting:
So when we try to make an AI whose physical consequence is the implementation of what is right, we make that AI’s causal chain start with the state of human brains—perhaps non-destructively scanned on the neural level by nanotechnology, or perhaps merely inferred with superhuman precision from external behavior—but not passed through the noisy, blurry, destructive filter of human beings trying to guess their own morals.
I asked how we could possibly come to know by testing that the scanning and brain modeling was working properly. I could have asked instead how we could test the hypothesis that the inference from behavior was working properly.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist. A provably good scientist. Provable because it is a simple program and we understand epistemology well enough to write a correct behavioral specification of a scientist and then verify that the program meets the specification. So we can let the AI design the brain scanner and perform the human behavioral experiments to calibrate its brain models. We only need to spot-check the science it generates, because we already know that it is a good scientist.
Hmmm. That is actually a pretty good argument, if that is what you are suggesting. I’ll have to give that one some thought.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
Sorry, not my area at the moment. I gave the links to refer to arguments for why having AI learn in the traditional sense is a bad idea, not for instructions on how to do it correctly in a currently feasible way. Nobody knows that, so you can’t expect an answer, but the plan of telling the AI things we think we want it to learn is fundamentally broken. If nothing better can be done, too bad for humanity.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist.
This is much closer, although a “scientist” is probably a bad word to describe that, and given that I don’t have any idea what kind of system can play this role, it’s pointless to speculate. Just take as the problem statement what you quoted from the post:
try to make an AI whose physical consequence is the implementation of what is right
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Relevant—Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
One step at a time, my good sir! Reducing the philosophical and mathematical problem of Friendly AI to the technological problem of uploading would be an astonishing breakthrough quite by itself.
I think this reflects the practical problem with Friendly AI—it is an ideal of perfection taken to an extreme that expands the problem scope far beyond what is likely to be near term realizable.
I expect that most of the world, research teams, companies, the VC community and so on will be largely happy with an AGI that just implements an improved version of the human mind.
For example, humans have an ability to model other agents and their goals, and through love/empathy value the well-being of others as part of our own individual internal goal systems.
I don’t see yet why that particular system is difficult or more complex than the rest of AGI.
It seems likely that once we can build an AGI as good as the brain we can build one that is human-like but only has the love/empathy circuitry in it’s goal system with the rest of the crud stripped out.
In other words if we can build AGI’s modeled after the best components of the best examples of altruistic humans, this should be quite sufficient.
That is, the task is not one we program the AI to accomplish—instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases
This is the straightforward approach.
Once you have an AGI that has the cognitive capability and learning capacity of a human infant brain, you teach it everything else in human language—right/wrong, ethics/morality, etc.
Programming languages are precise and well suited for creating the architecture itself, but human languages are naturally more effective for conveying human knowledge.
I tend to agree that we need a natural language interface to the AI. But it is far easier to create automatic proofs of program correctness when the really important stuff (like ethics) is presented in a formal language equipped with a deductive system.
There is something to be said for treating all the natural language input as if it were testimony from unreliable witnesses—suitable, perhaps, for locating hypotheses, but not really suitable as strong evidence for accepting the hypotheses.
But it is far easier to create automatic proofs of program correctness
I’m not sure how this applies—can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff—meta-ethics, epistemology, etc., be represented in some other way than by ‘neural’ networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn’t change meaning when the AI “rewrites its own code”
The really important stuff isn’t a special category of knowledge. It is all connected—a tangled web of interconnected complex symbolic concepts for which human language is a natural representation.
What is the precise mathematical definition of ethics? If you really think of what it would entail to describe that precisely, you would need to describe humans, civilization, goals, brains, and a huge set of other concepts.
In essence you would need to describe an approximation of our world. You would need to describe a belief/neural/statistical inference network that represented that word internally as a complex association between other concepts that eventually grounds out into world sensory predictions.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
These folks seem to agree with you about the massive complexity of the world, but seem to disagree with you that natural language is adequate for reliable machine-based reasoning about that world.
As for the rest of it, we seem to be coming from two different eras of AI research as well as different application areas. My AI training took place back around 1980 and my research involved automated proofs of program correctness. I was already out of the field and working on totally different stuff when neural nets became ‘hot’. I know next to nothing about modern machine learning.
I’ve read about CYC a while back—from what I recall/gather it is a massive handbuilt database of little natural language ‘facts’.
Some of the new stuff they are working on with search looks kinda interesting, but in general I don’t see this as a viable approach to AGI. A big syntactic database isn’t really knowledge—it needs to be grounded to a massive sub-symbolic learning system to get the semantics part.
On the other hand, specialized languages for AGI’s? Sure. But they will need to learn human languages first to be of practical value.
You look at CYC and see a massive hand-built database of facts.
I look and see a smaller (but still large) hand-built ontology of concepts
You, probably because you have worked in computer vision or pattern recognition, notice that the database needs to be grounded in some kind of perception machinery to get semantics.
I, probably because I have worked in logic and theorem proving, wonder what axioms and rules of inference exist to efficiently provide inference and planning based upon this ontology.
One of my favorite analogies and I’m fond of the Jainist? multi-viewpoint approach.
As for the logic/inference angle, I suspect that this type of database underestimates the complexity of actual neural concepts—as most of the associations are subconscious and deeply embedded in the network.
We use ‘connotation’ to describe part of this embedding concept, but I see it as even deeper than that. A full description of even a simple concept may be on the order of billions of such associations. If this is true, then a CYC like approach is far from appropriately scalable.
It appears that you doubt that an AI whose ontology is simpler and cleaner than that of a human can possibly be intellectually more powerful than a human.
All else being equal, I would doubt that with respect to a simpler ontology, while the ‘cleaner’ adjective is less well defined.
Look at it in terms of the number of possible circuit/program configurations that are “intellectually more powerful than a human” as a function of the circuit/program’s total bit size.
At around the human level of roughly 10^15 I’m almost positive there are intellectually more powerful designs—so P_SH(10^15) = 1.0.
I’m also positive that beyond some threshold there are absolutely zero possible configurations of superhuman intellect—say P_SH(10^10) ~ 0.0.
Of course “intellectually more powerful” is open to interpretation. I’m thinking of it here in terms of the range of general intelligence tasks human brains are specially optimized for.
IBM’s Watson is superhuman in a certain novel narrow range of abilities, and it’s of complexity around 10^12 to 10^13.
To get to that point we have to start from the right meaning to begin with, and care about preserving it accurately, and Jacob doesn’t agree those steps are important or particularly hard.
As for the start with the right meaning part, I think it is extremely hard to ‘solve’ morality in the way typically meant here with CEV or what not.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
As for the second part about preserving it accurately, I think that ethics/morality is complex enough that it can only be succinctly expressed in symbolic associative human languages. An AGI could learn how to model (and value) the preferences of others in much the same way humans do.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
Someone help me out. What is the right post to link to that goes into the details of why I want to scream “No! No! No! We’re all going to die!” in response to this?
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function! So whatever AI the first one builds is necessarily going to either have the same utility function (in which case the first AI is working correctly), or have a different one (which is a sign of malfunction, and given the complexity of morality, probably a fatal one).
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function. To the extent that we have a utility function at all, it would refer to the abstract computation called “morality”, which “better” is defined by. The most moral AI we could create is therefore one with precisely that utility function. The problem is that we don’t exactly know what our utility function is (hence CEV).
There is a sense in which a Friendly AGI could be said to be “better than us”, in that a well-designed one would not suffer from akrasia and whatever other biases prevent us from actually realizing our utility function.
AI’s without utility functions, but some other motivational structure, will tend to self-improve to a utility function AI. Utility-function AI’s seem more stable under self-improvement, but there are many reasons it might want to change its utility (eg speed of access, multi-agent situations).
Why would an AI which optimises for one thing create another AI that optimises for something else?
It wouldn’t if it initially considered itself to be the only agent in the universe. But if it recognizes the existence of other agents and the impact of other agents’ decisions on its own utility, then there are many possibilities:
The new AI could be created as a joint venture of two existing agents.
The new AI could be built because the builder was compensated for doing so.
The new AI could be built because the builder was threatened into doing so.
Building an AI with a different utility function is not going to satisfy the first AI’s utility function!
This may seem intuitively obvious, but it is actually often false in a multi-agent environment.
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function!
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If that AGI would not be somewhat better than us in the sense of having a better utility function, then ‘utility function’ is not a useful concept.
The problem is that we don’t exactly know what our utility function is (hence CEV)
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
Before tackling that problem, it would probably best to start with something much simpler, such as a utility function that could recognize dogs vs cats and other objects in images. If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If you mean that the AI doesn’t [ .. ]
That’s the AI being more rational than us, and therefore better optimising for its utility function.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
I occasionally point out that you can model any computable behaviour using a utility-maximizing algorithm, provided you are allowed to use a partially-recursive utility function.
Also, very little of the sequences have much of anything to do with AI. If I want to learn more about that I would look to Norvig’s book or more likely the relevant papers online. No need to be rude just because I don’t hold all your same beliefs.
Also, very little of the sequences have much of anything to do with AI.
It’s more of a problem with your understanding of ethics, as applied to AI (and since this is the main context in which AI is discussed here, I referred to that as simply AI). You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs.
No need to be rude just because I don’t hold all your same beliefs.
Unfortunately there is (in some senses of “rude”, such as discouraging certain conversational modes).
You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
Yes.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all?
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all!
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
One simple point is that there is no reason to expect AGIs to stop at exactly human level.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
very little of the sequences have much of anything to do with AI.
There is some discussion of the dangers of a uFAI Singularity, particularly in this debate between Robin Hanson and Eliezer. Much of the danger arises from the predicted short time period required to get from a mere human-level AI to a superhuman AI+. Eliezer discusses some reasons to expect it to happen quickly here and here. The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
For an analysis of the possibility of a hard takeoff in approaches to AI based loosely on modeling or emulating the human brain, see this posting by Carl Schulman, for example.
The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
If civilisation(t+1) can access resources much better than civilisation(t), then that is just another way of saying things are going fast—one must beware of assuming what one is trying to demonstrate here.
The problem I see with this thinking is the idea that civilisation(t) is a bunch of humans while civilisation(t+1) is a superintelligent machine.
In practice, civilisation(t) is a man-machine symbiosis, while civilisation(t+1) is another man-machine symbiosis with a little bit less man, and a little bit more machine.
Currently expected to be difficult, since we don’t know of an easy way to do so. That it’ll turn out to be easy (in the hindsight) is not totally out of the question.
There are some promising lines of attack (grounded in decision theory) that might take only a few years of research. We’ll see where they lead. Other open problems in FAI might start looking very solvable if we start making progress on this front.
This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly.
Yes, but it still has to be explicitly programmed to do that! The question is how to get it to do so. AFAIK shaper-anchor semantics is still quite a ways from being fully specified, but it seems the bigger obstacle is that an AI writer is less likely than not to take the effort to program it that way in the first place.
it’s really damn hard to precisely specify a paperclip [...]
This is surely the kind of thing that superintelligences will be good at. They will have access to every paperclip picture on the net, every paperclip specification too. They will surely have a much clearer idea about what a paperclip is than humans do. They will know what boxes are too.
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it’s really damn hard to precisely specify a paperclip.
I made a stab at it here, and it got some upvotes. So here’s a repost:
Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
(Please let me know if reposting violates LW ettiquette so I know not to do it again.)
Here’s a sort of fully general counterargument against proposals to naturalize human concepts in AI terms: if you can naturalize human concepts, you should be able to naturalize the human concept of a box. And if you can do that, we can build Oracle AI and save the world. It’s very easy to describe what we mean by ‘stay in the box’, but it turns out that seed (self-modifying!) AIs just don’t have a natural ontology for the descriptions.
This argument might be hella flawed; it seems kind of tenuous.
That assumption isn’t really a core part of the argument… the general “if specifying human concepts is easy, then come up with a plan for making a seed AI want to stay in a box” argument still stands, even if we don’t actually want to keep arbitrary seed AIs in boxes.
For the record I am significantly less certain than most LW or SIAI singularitarians that seed AIs not explicitly coded with human values in mind will end up creating a horrible future, or at least a more horrible future than something like CEV. I do think it’s worth a whole lot of continued investigation.
How would converging to a “universal morality” help produce paperclips?
If you’re a seed AI and you’re somewhat confused about what your creator meant when they said “valueOfCreatingPaperclips=infinity”, so you do what you think they were trying to get you to do, which was to create economic value by making paperclips, and the reason they wanted to do that was to make a profit for themselves, and the reason for that is they’re part of this larger system called humanity which is following this strange vector in preferencespace… Like I said, this doesn’t apply to AIs that are bad at that kind of philosophical reflection, but I’m not sure how likely it is that human extrapolated volition and babyeater extrapolated volition look at all different if you just got the extrapolator working right. I am now going to duck out of this conversation because saying unconventional things on LW means you have to be really careful about your phrasing, and I don’t have the necessary mental energy nor desire. I mostly just hope that someone out there is going to fill in the gaps of what I’m saying and therefore get to play with the ideas I’m trying to convey.
And the reason you value friendship is that “evolution” “made” it so, following the Big Bang. Informal descriptions of physical causes and effects don’t translate into moral arguments, and there’s no a priori reason to care about what other “agents” present in your causal past (light cone!) “cared” about, no more than caring about what they “hated”, or even to consider such a concept.
(I become more and more convinced that you do have a serious problem with the virtue of narrowness, better stop the meta-contrarian nonsense and work on that.)
You’re responding to an interpretation of what I said that assumes I’m stupid, not the thing I was actually trying to say. Do you seriously think I’ve spent a year at SIAI without understanding such basic arguments? I’m not retarded. I just don’t have the energy to think through all the ways that people could interpret what I’m saying as something dumb because it pattern matches to things dumb people say. I’m going to start disclaiming this at the top of every comment, as suggested by Steve Rayhawk.
Specifically, in this case, in the comment you replied to and elsewhere in this thread, I said: “this doesn’t apply to AIs that are bad at that kind of philosophical reflection”. I’m making a claim that all well-designed AIs will converge to universal ‘morality’ that we’d like upon reflection even if it wasn’t explicitly coded to approximate human values. I’m not saying your average AI programmer can make an AI that does this, though I am suggesting it is plausible.
This is stupid. I’m suggesting a hypothesis with low probability that is contrary to standard opinion. If you want to dismiss it via absurdity heuristic go ahead, but that doesn’t mean that there aren’t other people who might actually think about what I might mean while assuming that I’ve actually thought about the things I’m trying to say. This same annoying thing happened with Jef Allbright, who had interesting things to say but no one had the ontology to understand him so they just assumed he was speaking nonsense. Including Eliezer. LW inherited Eliezer’s weakness in this regard, though admittedly the strength of narrowness and precision was probably bolstered in its absence.
If what I am saying sounds mysterious, that is a fact about your unwillingness to be charitable as much as it is about my unwillingness to be precise. (And if you disagree with that, see it as an example.) That we are both apparently unwilling doesn’t mean that either of us is stupid. It just means that we are not each others’ intended audience.
(Downvoted.)
No one said you were stupid.
People are responding to the text of your comments as written. If you write something that seems to ignore a standard argument, then it’s not surprising that people will point out the standard argument.
As a parable, imagine an engineer proposing a design for a perpetual motion device. An onlooker objects: “But what about conservation of energy?” The engineer says: “Do you seriously think I spent four years at University without understanding such basic arguments?” An uncharitable onlooker might say “Yes.” A better answer, I think, is: “Your personal credentials are not at issue, but the objection to your design remains.”
(Upvoted.)
I suppose I mostly meant ‘irrational’, not stupid. I just expected people to expect me to understand basic SIAI arguments like “value is fragile” and “there’s no moral equivalent of a ghost in the machine” et cetera. If I didn’t understand these arguments after having spent so much time looking at them… I may not be stupid, but there’d definitely be some kind of gross cognitive impairment going on in software if not in hardware.
There were a few cues where I acknowledged that I agreed with the standard argument (AGI won’t automatically converge to Eliezer’s “good”), but was interested in a different argument about philosophically-sound AIs that didn’t necessarily even look at humanity as a source of value but still managed to converge to Eliezer’s good, because extrapolated volitions for all evolved agents cohere. (I realize that your intuition is interestingly perhaps somewhat opposite mine here, in that you fear more than I do that there won’t be much coherence even among human values. I think that we might just be looking at different stages of extrapolation… if human near mode provincial hyperbolic discounting algorithms make deals with human far mode universal exponential discounting algorithms, the universal (pro-coherence) algorithms will win out in the end (by taking advantage of near mode’s hyperbolic discounting). If this idea is too vague or you’re interested I could expand on this elsewhere.)
Your parable makes sense, it’s just that I don’t think I was proposing a perpetual motion device, just something that could sound like a perpetual motion device if I’m not clear enough in my exposition, which it looks like I wasn’t. I was just afraid of italicizing and bolding the disclaimers because I thought it’d appear obnoxious, but it’s probably less obnoxious than failing to emphasize really important parts of what I’m saying.
What does time discounting have to do with coherence? Of course exponential discounting is “universal” in the sense that if you’re going to time-discount at all (and I don’t think we should), you need to use an exponential in order to avoid preference reversals. But this doesn’t tell us anything about what exponential discounters are optimizing for.
I think your comments would be better received if you just directly talked about your ideas and reasoning, rather than first mentioning your shocking conclusions (“theism might be correct,” “volitions of evolved agents cohere”) while disclaiming that it’s not how it looks. If you make a good argument that just so happens to result in a shocking conclusion, then great, but make sure the focus is on the reasons rather than the conclusion.
vs.
It really really seems like these two statements contradict each other; I think this is the source of the confusion. Can you go into more detail about the second statement?
In particular, why would two agents which both evolved but under two different fitness functions be expected to have the same volition?
“Basic SIAI arguments like “value is fragile”″ …? You mean this...?
The post starts out with:
...it says it isn’t basic—and it also seems pretty bizarre.
For instance, what about the martians? I think they would find worth in a martian future.
Yeah, and paperclippers would find worth in a future full of paperclips, and pebblesorters would find worth in a future full of prime-numbered heaps of pebbles. Fuck ’em.
If the martians are persons and they are doing anything interesting with their civilization, or even if they’re just not harming us, then we’ll keep them around. “Human values” doesn’t mean “valuing only humans”. Humans are capable of valuing all sorts of non-human things.
Suppose someone who reliably does not generate common obviously wrong ideas/arguments has an uncommon idea that is wrong in a way that is non-obvious, but that you could explain if the wrong idea itself were precisely explained to you. But this person does not precisely explain their idea, but instead vaguely points to it with a description that sounds very much like a common obviously wrong idea. So you try to apply charity and fill in the gaps to figure out what they are really saying, but even if you do find the idea that they had in mind, you wouldn’t identify as such, because you see how that idea is wrong, and being charitable, you can’t interpret what they said in that way. How could you figure out what this person is talking about?
You’d have to be speaking their language in the first place. That’s why I wrote about intended audiences. But it seems that at my current level of vagueness my intended audience doesn’t exist. I’ll either have to get more precise or stop posting stuff that appears to be nonsense.
Sometimes it is impossible to reach an intended audience when the not-intended audience is using you as a punching bag to impress their intended audience. Most of debate in conventional practice is, after all, about trying to spin what the other person says to make them look bad. If your ‘intended audience’ then chose to engage with you at the level you were hoping to converse they risk being collaterally damaged in the social bombardment.
For my part I reached the conclusion that you are probably using a different conception of ‘morality’, analogous to the slightly different conception of ‘theism’ from your recent thread. This is dangerous because in group signalling incentives are such that people will be predictably inclined to ignore the novelty in your thoughts and target the nearest known stupid thing to what you say. And you must admit: you made it easy for them this time!
It may be worth reconsidering the point you are trying to discuss a little more carefully, and perhaps avoiding the use of the term ‘morality’. You could then make a post on the subject such that some people can understand your intended meaning and have useful conversation without risking losing face. It will not work with everyone, there are certain people you will just have to ignore. But you should get some useful discussion out of it. I note, for example, that while your ‘theism’ discussion got early downvotes by the most (to put it politely) passionate voters it ended up creeping up to positive.
As for guessing what sane things you may be trying to talk about I basically reached the conclusion “Either what you are getting at boils down to the outcome of acausal trade or it is stupid”. And acausal trade is something that I can not claim to be certain about.
That sounds bad—perhaps reconsider.
As a “peace offering”, I’ll describe a somewhat similar argument, although it stands as an open confusion, not so much a hypothesis.
Consider a human “prototype agent”, additional data that you plug in into a proto-FAI (already a human-specific thingie, or course) to refer to precise human decision problem. Where does this human end, where are its boundaries? Why would its body be the cutoff point, why not include all of its causal past, all the way back to the Big Bang? At which point, talking about the human in particular seems to become useless, after all it’s a tiny portion of all that data. But clearly we need to somehow point to the human decision problem, to distinguish it from frog decision problem and the like, even though such boundless prototype agents share all of their data. Do you point to human finger and specify this actuator as the locus through which the universe is to be interpreted, as opposed to pointing to a frog’s leg? Possibly, but it’ll take a better understanding of interpreting decision problems from arbitrary agents’ definitions to make progress on questions like this.
With each comment like this you make, and lack of comments that show clear understanding, I think that more and more confidently, yes. Disclaimers don’t help in such cases. You don’t have to be stupid, you clearly aren’t, but you seem to be using your intelligence to confuse yourself by lumping everything together instead of carefully examining distinct issues. Even if you actually understand something, adding a lot of noise over this understanding makes the overall model much less accurate.
One thing they rather obviously might converge on is the “goal system zero” / “Universal Instrumental Values” thing. The other main candidates seem to be “fitness” and “pleasure”. These might well preserve humans for a while—in historical exhibits.
Nor is there an a priori reason for an AI to exist, for it to understand what ‘paperclips’ are, let alone for it to self-improve through learning like a human child does, absorb human languages, and upgrade itself to the extent necessary to take over the world.
I suspect that any team of scientists or engineers with the knowledge and capability required to build an AGI with at least human-infant level cognitive capacity and the ability to learn human language will understand that making the AI’s goal system dynamic is not only advantageous, but is necessitated in practice by the cognitive capabilities required for understanding human language.
The idea of a paperclip maximizer taking over the world is a mostly harmless absurdity, but it also detracts from serious discussion.
But that sounds like we’re programming the AI in English. I can’t see an AI with a motivational system well-defined enough to work at all getting confused in that way; would “Do what my creator intended me to do, if I can’t figure out what else to do” even show up as a motivational drive if it is not explicitly coded in?
He’s speaking to you in English, because you speak English. Also, the website wouldn’t let him post the bytecode.
You made a sweeping statement about all possible AI architectures. What are your reasons for it?
There’s reason to suspect that any human-level AI must be programmed in human languages.
In fact, that’s almost tautological by virtue of the Turing Test.
Another way to look at it: we developed simplified formal computer languages to program the tiny simple circuits we could build at the time, but the goal for AGI has always been to develop a system you could directly program in the full complexity of human languages.
Think about how the software industry works—high level business goals in English, translated into more technical english for system engineers and designers, translated down into the much simpler verbose programming languages such as C++, then machine translated to the even simpler assembly the CPU can actually understand.
Programming an AI in C++? Doesn’t compute.
Of the concepts named for Alan Turing it is “Turing Completeness” that is far more interesting and important than the chatbot test. If you think on the concept of a Turing complete computation system you will perhaps realise why the rest of us would consider your claim extremely silly. Well, one of the reasons anyway.
That last statement you quoted was silly. Not funny at all, apparently.
What?
Do you mean humanlike AIs? An AI capable of passing the Turing Test would of course need to understand human language well enough to act convincingly human (or at least do a really good imitation), but that’s not necessarily a human-level AI (convincing people that you’re human is a separate task from actually being human, probably a much easier one), and human-level AIs in general needn’t necessarily understand human language any better than any other sort of language by default.
Anyway, an AI being “programmed in human languages” seems to be going by the “programming = instructions being given to a human servant” metaphor, and if you want that to work, you clearly first need to write the servant in something other than human language. And copying human psychology well enough that the AI actually understands human language as well as a human does, rather than being able to imitate understanding well enough to carry on a text-based conversation, is no easy task, and is probably a lot harder than manually coding a simple goal system like paperclip maximization in a lower-level language. But that could still be an AGI.
Human level AI—an AGI design capable of matching the full intellectual capabilities of the best human scientists/engineers.
To get to H level in a practical timeframe, a human AI will have to learn human knowledge, it will have to experience an equivalent to a standard 20-25 year education.
Learning human knowledge in practice requires learning human language as an early initial precursor step.
The software of a human mind—the memeset or belief network, is essentially a complex human language program.
For an AI to achieve human-level, it will have to actually understand human language as well as a human does, and this requires a bunch of algorithmic complexity from the human brain at the hardware level and it implies the capability to parse and run human language programs.
So you only need to program the infant brain in a programming language—the rest can be programmed in human language.
If it doesn’t have the capacity to understand human level language then it’s not an AGI—as that is the defining characteristic of the concept (by my/Turing’s definition).
And thus by extension, the defining characteristic of a human-mind is human language capability.
EDIT: Why are you downvoting? Don’t agree and don’t want to comment?
Turing never intended his test to be adopted as “the defining characteristic of the concept [of AGI]” in anything like this fashion. Human ‘level’ language is also somewhat misleading in as much as it implies it is reaching a level of communication power rather than adapting specifically to the kind of communications humans happen to have evolved—especially the quirks and weaknesses.
I disagree somewhat. It’s difficult to know exactly what “he intended”, but the opening of his paper which introduces the concept, starts with “Can machines think?”, and describes a reasonable language based test: an intelligent machine is one that can convince us of it’s intelligence in plain human language.
I meant natural language, the understanding of which certainly does require a certain minimum level of cognitive capabilities.
We have a much greater understanding of what the “think” in “Can machines think?” means now. We have better tests than seeing if they can fake human language.
The test isn’t about faking human language, it’s about using language to probe another mind. Whales and elephants have brains built out of similar quantities of the same cortical circuits but without a common language stepping into their minds is very difficult.
What’s a better test for AI than the turing test?
Give it a series of fairly difficult and broad ranging tasks, none of which it has been created with existing specialised knowledge to handle.
Yes—the AIQ idea.
But how do you describe the task and how does the AI learn about it? There’s a massive gulf between AI’s which can have the task/game described in human language and those that can not. Whale brains and elephants fall in the latter category. An AI which can realistically self-improve to human levels needs to be in the former category, like a human child.
You could define intelligence with an AIQ concept so abstract that it captures only learning from scratch without absorbing human knowledge, but that would be a different concept—it wouldn’t represent practical capacity to intellectually self-improve in our world.
Use something like Prolog to declare the environment and problem. If I knew how the AI would learn about it, I could build an AI already. And indeed, there are fields of machine learning for things such as Bayesian inference.
If you have to describe every potential probelm to the AI in Prolog, how will it learn to become a computer scientist or quantum physicist?
Describe the problem of learning how to become a computer scientist or quantum physicist, then let it solve that problem. Now it can learn to become a computer scientists or quantum physicist.
(That said, a better method would be to describe computer science and quantum physics and just let it solve those fields.)
Or a much better method: describe the problem of an AI that can learn natural language, the rest follows.
Except for all problems which are underspecified in natural language.
Which might be some pretty important ones.
Agreement that human children are more intelligent than whales or elephants is likely to be the closest we get to agreement on this subject. You would need to absorb a lot of new knowledge from all the replies from various sources that have been provided to you here already before in progress is possible.
Unfortunately it seems we are not even fully in agreement about that. A turing style test is a test of knowledge, the AIQ style test is a test of abstract intelligence.
An AIQ type test which just measures abstract intelligence fails to differentiate between feral einstein and educated einstein.
Effective intelligence, perhaps call it wisdom, is some product of intelligence and knowledge. The difference between human minds and those of elephants or whales is that of knowledge.
My core point, to reiterate again: the defining characteristic of human minds is knowledge, not raw intelligence.
Intelligence can produce knowledge from the environment. Feral Einstein would develop knowledge of the world, to the extent that he wasn’t limited by non-knowledge/intelligence factors (like finding shelter or feeding himself).
Possibly relevant: AIXI-style IQ tests.
Very probably not. I’m claiming that the desire to code it in would be convergent, ’cuz it’s the best way to do AI even if you think you’re just trying to maximize paperclips. Of course, most AGI researchers aren’t that clever, so again, we still need to raise awareness about AGI dangers. I’m just floating a contrarian hypothesis that seems somewhat neglected.
But it’s a lot harder to code that than to code “maximize paperclips”.
Then you should have said that!
That sounds exactly like CEV.
I think a closer match is the “shaper-anchor semantics” from Eliezer’s “Creating Friendly AI”.
I think I see what you’re saying; just as we reflect on our desires and try to understand how they tick and where, biologically and historically and culturally, they come from, so also might any AI.
However, the thing about it is: that doesn’t actually change those values. For example (and despite the dire warnings of some creationists), despite the fact that we now understand that our value system is a consequence of an evolutionary algorithm, we haven’t actually started valuing evolutionary goals over our own built-in goals. For example, contraception is popular even though it’s quite silly from the perspective of gene propogation.
Similarly, a paperclip-maximizer might well be interested in figuring out why its utility function is what it is, so that it may better understand the world it lives in… but that’s not going to change its overriding and primary interest in making paperclips.
It seems as though that sometimes triggers intimate pair-bonding activities while reducing your exposure to STDs. Use of condoms is often not remotely silly from that perspective—IMHO.
The example still works since there are quite a few couples who use condoms because they just don’t want to have kids. They don’t have any worry about STDs from their partner. If you insist on a clear cut case look at men who get vasectomies.
The idea that use of contraception is “silly” from the perspective of gene propagation seem just wrong to me. There are plenty of cases where it would make sense for those who want to spread their genes around to agree to use contraceptives. Contraceptive use makes sense sometimes, and not others.
It could be claimed that the average effect of contraception on genes is negative—but that seems to be a whole different thesis.
Tim, do you agree that there exist couples who plan to never have children and use contraception to that end?
Sure. Surely we are not disagreeing here. The original comment was:
My position is just that contraception has a perfectly reasonably place for gene propogators. The idea that contraception is always opposed to your genetic interests is wrong. Lack of contraception can easily result in things like this—which really doesn’t help. That using contraception is “silly” from a genetic perspective is a popular myth.
I’m not sure if we are. The fact that contraception might have a reasonable place for gene propagators is not the issue. The point is that much, and possibly the vast majority, of contraceptive use is contrary to the goals of gene propagation.
Not really. Remember, evolution doesn’t care about your happiness. Indeed, regarding the example you linked to, from an evolutionary perspective,a one night stand with all the protection is utterly useless. It is very likely in that male’s evolutionary advantage to not use condoms.
And even if you don’t agree with the condom example the other example, of a people engaging in a generally irreversible or difficult to reverse operation which renders them close to sterile is pretty clearly against the interest of gene propagation.
Humans evolved in a context where we didn’t have easy contraception and the best humans could do to prevent contraception was things like coitus interruptus. It shouldn’t surprise you that evolution has not made human instincts catch up with modern technologies.
One might think that from an evolutionary perspective it makes sense to substantially delay or reduce offspring number so as to invest maximum resources in a small number of offspring. But humans in the developed world now reside in a situation with low disease rates and lots of resources, so that strategy is sub-optimal from an evolutionary perspective. Look at how charedi(ultra-orthodox) Jews and the Amish are two of the fastest growing populations in the United States.
I can see what you think the issue is. What I don’t see is where in the context you are getting that impression from.
Your example is stacked to favour your conclusion. What you need to try and do in order to understand my position is to think about an example that favours my conclusion.
So: get rid of the one-night stand, and imagine that the girl is desirable—that having safe sex with her looks like the best way to initiate a pair-bonding process leading to the two of you having some babies together—and that the alternative is rejection, and her walking off and telling her friends what a jerk you are when it comes to protecting your girl.
In the modern context, if you impregnate someone without planning it out properly, there’s a non-negligible chance they’ll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children’s actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
I think that’s a retcon. People use contraception so they can have more sex than they would if they had to worry about having kids every time. They may or may not rationalise further, I suspect that generally they don’t.
In terms of genetic success, having more kids than you can keep track of is pretty much the ideal, as long as all or at least most survive to reproductive adulthood.
But some people consciously choose never to have any kids. That’s silly from the perspective of gene propagation if anything is.
Sure it does. A devout priest spends half his life celibate and serving God. One day he has a crisis, reads a bunch of stuff on the internet and suddenly realizes he doesn’t believe in God. His values change.
Even this is questionable. I suspect any concept of universal morality must be evolutionary. This certainly is a widespread concept in systems/transhumanist/singularitan/cosmist thought. We do value evolution in and of itself.
It’s probably possible in principle to build such an AI—it would probably need some sort of immutable hard-coded paperclip recognition module which it could evaluate potential simulated futures generated from the more complex general intelligence system.
If such a thing developed to a human level or beyond and could reflect on it’s cognition, it may explain in lucid detail how futures filled with paperclips were good and others were evil.
It could even understand that it’s concepts of morality and good/bad were radically different than those of humans, and it would even understand that this difference relates to it’s hard coded paper-clip recognizer, and it would explain in detail how this architecture was superior to human value systems .. because it helped to maximize expected future paperclips.
It could even write books such as “Paperclip Morality: the Truth”.
But just because such a thing is possible in principle doesn’t make it the slightest bit likely.
If you can build an AGI that can understand human language, it would be much easier and considerably more effective to make the AGI’s goal system dynamically modifiable on reflection through human language.
Instead of having a special hard-coded circuit to evaluate the utility of potential futures, you could just have the general conceptual circuitry handle this. The concept of ‘good’ would still be somewhat special in it’s role in the goal system itself, but the ‘goodness recognizer’ could change and evolve over time.
Well, the counter-argument to that particular example would be that the priest’s belief in God wasn’t a terminal value; rather, their goals of being happy and helping other people and understanding the universe were. Believing in and obeying God were just instrumental values.
However, agreed that there’s nothing in particular forcing people, weird and funky and clunky as our minds are, from always having the same fixed terminal values either. To pick an extreme example, peoples’ brains can sometimes be messed up severely by hormonal imbalances, which can in turn cause people to do such drastically anti-own-terminal-value things as committing suicide.
I should’ve been more specific and just said that, in general, understanding evolutionary psychology never or only very rarely causes peoples’ terminal values to change.
Human morality is a product of evolution; however, our morality is not itself an evolutionary algorithm execution mechanism. It’s kind of a vague approximation of one (in that all the moralities that sucked for fitness were selected against), but it still often leads to drastically different results than a straight-up evolutionary fitness maximization algorithm with access to our brains’ resources would.
For example: I intend never to have biological children and consider this decision to be a moral one. However, from an evolutionary perspective, deliberately preventing my own genes from propagating is just plain silly.
Yes, the paper-clip maximizer is just a whimsical example. However, similarly Really Unfriendly optimizers are quite plausible. Imagine the horrors that could result from a naive human-happiness-maximizer hitting the singularity asymptote.
Yes, that would be important, but it still wouldn’t be enough to solve the problem; in fact, the really hard part of the problem still remains! The happiness-maximizer might base its understanding of happiness on descriptive human usage of the word, and end up with a truly thorough and consistent understanding of the word… and then still turn everybody into nearly mindless wireheads.
Our morality engines and our language aren’t properly tuned for dealing with the kind of reality-bending power a superintelligent entity would have.
You may not have liked that particular example, but I think you are in agreement that terminal values change.
Just to make sure though, a few more examples:
someone who likes chocolate ice cream and then some years later prefers vanilla instead
someone who likes impressionist art but then years later prefers post-modern
someone who likes cats more than dogs
someone who likes chinese culture more than ethiopian
I don’t find such maximizers significantly plausible at even the human level intelligence. Possible in principle? Sure. But if you look at realistic, plausible routes to AGI it becomes clear that an AGI necessarily will be programmed in human languages and will pick up human cultural programs.
And finally, even if it was plausible that a flawed design could hit the singularity asymptote, that itself might only be a big problem if it had a short planning horizon.
It seems that all superintelligences with infinite planning horizons become behaviorally indistinguishable. All long-term value systems converge on a single universal attractor—they become cosmists.
That is what I mean when I said “any concept of universal morality must be evolutionary”.
That would again be assuming humans capable of building a superhuman AGI but asinine enough to attempt to somehow hardcode it’s goal system, instead of making it open-ended dynamic as a human’s.
How would you build a happiness maximizer and fix the value of happiness? The meaning of a word in a human brain is stored as huge set of associate weights that anchor it in a massive distributed belief network. The exact meaning of each word changes over time as the network learns and reconfigures itself—no concept is quite static. So for an AGI to understand the word in the same way we do, the word’s meaning is always subject to some drift. And this is a good thing.
I think we may be experiencing some terminology confusion here. Just to be clear, you realize that these are all not terminal values, right?
Here’s the big issue: if it’s open-ended, how do we keep it from drifting off somewhere terrible? The system that guides that seems to be the largest potential risk point of the approach you describe.
I’m very confused by this; can you go into more detail about why you think this is so? In particular, why would it be true for all long-term value systems (including flawed and simplistic value systems), and not just a very small subset?
No. What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens? I’m not sure I buy into the concept.
The point of value or preferences from the perspective of intelligence is to rate potential futures.
We are open-ended! Our future-preferences depend on and our intertwined with our knowledge. So any superintelligence or evolutionary accelerator we create will also need to be open-ended, or it wouldn’t be protecting our dynamic core.
I discussed some of this in my first, somewhat hasty, LW post here. A few others here have mentioned a similar idea, I may write more about it as I find it interesting.
Basically, if your planning horizon extends to infinity you will devote all of your resources towards expanding your net intelligence for the long term future, regardless of what your long term goals are.
So no matter whether your long term goal is to maximize paper-clips, human happiness or something more abstract, in each case this leads to an identical outcome for the foreseeable future: a local computational singularity with an exponentially expanding simulated metaverse.
There is some speculation within physics that black hole like singularities can create new physical universes through inflation. If this is true than the long term goals of a superintelligence are best served by literally creating new physical multiverses that have more of the desirable space-time properties.
No, what I’m referring to is also known as an intrinsic value. It’s a value that is valuable in and of itself, not in justification for some other value. A non-terminal value is commonly referred to as an instrumental value.
For example, I value riding roller-coasters, and I also value playing Dance Dance Revolution. However, those values are expressible in terms of another, deeper value, the value I place on having fun. That value may in turn be thought of as an instrumental value of a yet deeper value: the value I place on being happy moment-to-moment.
If you were going to implement your own preference function as a Turing machine, trying to keep the code as short as possible, the terminal values would be the things that machine would value.
Okay, I see where you’re coming from. However, from a human perspective, that’s still a pretty large potential target range, and a large proportion of it is undesirable.
From the deeper perspective of computational neuroscience, the intrinsic/instrumental values reduce to cached predictions of your proposed ‘terminal value’ (being happy moment-to-moment), which reduces to various types of stimulations of the planning reward circuitry.
Labeling the experience of chocolate ice cream as an ‘instrumental value’ and the resulting moment-to-moment happiness as the real ‘terminal value’ is a useless distinction—it then collapses your terminal values down to the singular of ‘happiness’ and relabels everything worthy of discussion as ‘instrumental’.
The quality of being happy moment-to-moment is anything but a single value and should not by any means be reduced to a single concept. It is a vast space of possible mental stimuli, each of which creates a unique conscious experience.
The set of mental states encompassed by “being happy moment-to-moment moment-to-moment” is vast: the gustatory pleasure of eating chocolate ice cream, the feeling of smooth silk sheets, the release of orgasm, the satisfaction of winning a game of chess, the accomplishment of completing a project, the visual experience of watching a film, the euphoria of eureka, all of these describe entire complex spaces of possible mental states.
Furthemore, the set of possible mental states is forever dynamic, incomplete, and undefined. The set of possible worlds that could lead to different visual experiences, as just a starter example, is infinite, and each new experience or piece of knowledge itself changes the circuitry underlying the experiences and thus changes our values.
The simplest complete turing machine implementation of your preference function is an emulation of your mind. It is you, and it has no perfect simpler equivalent (although many imperfect simulations are possible).
The core of the cosmist idea is that for any possible goal evaluator with an infinite planning horizon, there is a single convergent optimal path towards that goal system. So no, the potential target range in theory is not large at all—it is singularly narrow.
As an example, consider a model universe consisting of a modified game of chess or go. The winner of the game is then free to arrange the pieces on the board in any particular fashion (including the previously dead pieces). The AI’s entire goal is to make some particular board arrangement - perhaps a smily face. For any such possible goal system, all AI’s play the game exactly the same at the limits of intelligence—they just play optimally. Their behaviour doesn’t differ in the slightest until the game is done and they have won.
Whether the sequence of winning moves such a god would make on our board is undesirable or not from our current perspective is a much more important, and complex, question.
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it’s really damn hard to precisely specify a paperclip. (There are things that are easier to specify that this argument doesn’t apply to and that are more plausibly dangerous, like hyperintelligent theorem provers...) Thus in trying to figure out what it’s utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) ‘maximize paperclips’ is because ‘maximize paperclips’ was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain. This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly. Admittedly, most uFAIs probably won’t be that sophisticated, and so worrying about AI-related existential risks is still definitely a big deal. We just might want to be a little more cognizant of potential motivations for people who disagree with what has recently been dubbed SIAI’s ‘scary idea’.
Hm. I suppose that’s possible, though it would require that the AI be given a utility function that’s specifically meant to be amenable to that kind of revision.
Under the most straightforward (i.e. not CEV-style) utility function design, fuzziness in its definition of “paperclip” would just drive the paperclip-maximizer to choose the possible definition that yields the highest utility score.
To pick a different silly example, a dog-maximizer with a utility function based on the number of dogs in the universe would simply prefer to tile the solar system with tiny Chihuahas rather than Great Danes; the whole range of “dog” definitions fit the function, so it just chooses the one that is most convenient for maximum utility. It wouldn’t try to resolve it by trying to decide which definition is more in line with the designer’s ideals, unless “consider the designer’s ideals” were designed into the system from the start.
Is designing “consider the designer’s ideals” in an AI difficult?
Currently expected to be difficult, since we don’t know of an easy way to do so. That it’ll turn out to be easy (in the hindsight) is not totally out of the question.
Has anyone considered approaching this problem in the same way we might approach “read the user’s handwriting”? That is, the task is not one we program the AI to accomplish—instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases.
Mirrors and Paintings (yes, you want to point your program at the world and have it figure out what you referred to), The Hidden Complexity of Wishes (if you need to answer AI’s question or give it instructions, you’re doing something wrong and it won’t work).
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
But then we get down to doing the comparison. The AI informs me that what I really want is to kill my father and sleep with my mother. I deny this. Do we take this as evidence that the AI really does know me better than I know myself, or as a symptom of a bug?
I would argue that if you don’t need to answer the AI’s questions or give it instructions, you’re doing something wrong and it won’t work. By definition. At least for the first ten thousand scans or so. And even then there will remain questions on which the AI and introspection would deliver different answers. Questions with hidden complexity. I just don’t see how anyone would trust a CEV extrapolated from brain scans until we had decades of experience suggesting that scanning and modeling yields better results than introspection.
Agreed. And any useful AI will have to understand human language to do or learn much anything of value.
The detailed analysis of full brain scanning tech I’ve seen puts it far into the future, well beyond human-level AGI.
You have to make sure AI predictably gives a better answer even on questions where you disagree. And there will be questions which can’t even be asked of a human.
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer’s suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven’t you just asked me to assume that there are no differences?
Sorry, I simply don’t understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Because that’s a conceptually straightforward assumption that we can safely make in a philosophical argument.
The upload is not the AI (and Eliezer’s post doesn’t refer to uploads IIRC, but for the sake of the argument assume they are available as raw material). You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
What would I need to make of that?
But this is not a philosophical argument.
To recap:
I suggested that an AI which is a precursor to the FAI should come to understand human values by interacting (over an extended ‘training’ period) with actual humans—asking them questions about their values and perhaps performing some experiments as in a psych or game theory laboratory.
You responded by linking to this, which as I read it suggests that the most accurate and efficient way to extract the values of a human test subject would be by carrying out a non-destructive brain scan. Quoting the posting:
I asked how we could possibly come to know by testing that the scanning and brain modeling was working properly. I could have asked instead how we could test the hypothesis that the inference from behavior was working properly.
These are questions about engineering and neuroscience, not questions of philosophy. The question of what is right/wrong is a philosophical question. The question of what do humans believe about right and wrong is a psychology question. The question of how those beliefs are represented in the brain is a neuroscience question. The question of how an AI can come to learn these things is GOFAI. The question of how we will know we have done it right is a QC question. Software test. That was the subject of my comment. It had nothing at all to do with philosophy.
Ok, in this context, I interpret this to mean that we will not program in the neuroscience information that it will use to interpret the brain scans. Instead we will simply program the AI to be a good scientist. A provably good scientist. Provable because it is a simple program and we understand epistemology well enough to write a correct behavioral specification of a scientist and then verify that the program meets the specification. So we can let the AI design the brain scanner and perform the human behavioral experiments to calibrate its brain models. We only need to spot-check the science it generates, because we already know that it is a good scientist.
Hmmm. That is actually a pretty good argument, if that is what you are suggesting. I’ll have to give that one some thought.
Sorry, not my area at the moment. I gave the links to refer to arguments for why having AI learn in the traditional sense is a bad idea, not for instructions on how to do it correctly in a currently feasible way. Nobody knows that, so you can’t expect an answer, but the plan of telling the AI things we think we want it to learn is fundamentally broken. If nothing better can be done, too bad for humanity.
This is much closer, although a “scientist” is probably a bad word to describe that, and given that I don’t have any idea what kind of system can play this role, it’s pointless to speculate. Just take as the problem statement what you quoted from the post:
Relevant—Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
One step at a time, my good sir! Reducing the philosophical and mathematical problem of Friendly AI to the technological problem of uploading would be an astonishing breakthrough quite by itself.
I think this reflects the practical problem with Friendly AI—it is an ideal of perfection taken to an extreme that expands the problem scope far beyond what is likely to be near term realizable.
I expect that most of the world, research teams, companies, the VC community and so on will be largely happy with an AGI that just implements an improved version of the human mind.
For example, humans have an ability to model other agents and their goals, and through love/empathy value the well-being of others as part of our own individual internal goal systems.
I don’t see yet why that particular system is difficult or more complex than the rest of AGI.
It seems likely that once we can build an AGI as good as the brain we can build one that is human-like but only has the love/empathy circuitry in it’s goal system with the rest of the crud stripped out.
In other words if we can build AGI’s modeled after the best components of the best examples of altruistic humans, this should be quite sufficient.
This is the straightforward approach.
Once you have an AGI that has the cognitive capability and learning capacity of a human infant brain, you teach it everything else in human language—right/wrong, ethics/morality, etc.
Programming languages are precise and well suited for creating the architecture itself, but human languages are naturally more effective for conveying human knowledge.
I tend to agree that we need a natural language interface to the AI. But it is far easier to create automatic proofs of program correctness when the really important stuff (like ethics) is presented in a formal language equipped with a deductive system.
There is something to be said for treating all the natural language input as if it were testimony from unreliable witnesses—suitable, perhaps, for locating hypotheses, but not really suitable as strong evidence for accepting the hypotheses.
I’m not sure how this applies—can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff—meta-ethics, epistemology, etc., be represented in some other way than by ‘neural’ networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn’t change meaning when the AI “rewrites its own code”
By formal, I assume you mean math/code.
The really important stuff isn’t a special category of knowledge. It is all connected—a tangled web of interconnected complex symbolic concepts for which human language is a natural representation.
What is the precise mathematical definition of ethics? If you really think of what it would entail to describe that precisely, you would need to describe humans, civilization, goals, brains, and a huge set of other concepts.
In essence you would need to describe an approximation of our world. You would need to describe a belief/neural/statistical inference network that represented that word internally as a complex association between other concepts that eventually grounds out into world sensory predictions.
So this problem—that human language concepts are far too complex and unwieldy for formal verification—is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
These folks seem to agree with you about the massive complexity of the world, but seem to disagree with you that natural language is adequate for reliable machine-based reasoning about that world.
As for the rest of it, we seem to be coming from two different eras of AI research as well as different application areas. My AI training took place back around 1980 and my research involved automated proofs of program correctness. I was already out of the field and working on totally different stuff when neural nets became ‘hot’. I know next to nothing about modern machine learning.
I’ve read about CYC a while back—from what I recall/gather it is a massive handbuilt database of little natural language ‘facts’.
Some of the new stuff they are working on with search looks kinda interesting, but in general I don’t see this as a viable approach to AGI. A big syntactic database isn’t really knowledge—it needs to be grounded to a massive sub-symbolic learning system to get the semantics part.
On the other hand, specialized languages for AGI’s? Sure. But they will need to learn human languages first to be of practical value.
Blind men looking at elephants.
You look at CYC and see a massive hand-built database of facts.
I look and see a smaller (but still large) hand-built ontology of concepts
You, probably because you have worked in computer vision or pattern recognition, notice that the database needs to be grounded in some kind of perception machinery to get semantics.
I, probably because I have worked in logic and theorem proving, wonder what axioms and rules of inference exist to efficiently provide inference and planning based upon this ontology.
One of my favorite analogies and I’m fond of the Jainist? multi-viewpoint approach.
As for the logic/inference angle, I suspect that this type of database underestimates the complexity of actual neural concepts—as most of the associations are subconscious and deeply embedded in the network.
We use ‘connotation’ to describe part of this embedding concept, but I see it as even deeper than that. A full description of even a simple concept may be on the order of billions of such associations. If this is true, then a CYC like approach is far from appropriately scalable.
It appears that you doubt that an AI whose ontology is simpler and cleaner than that of a human can possibly be intellectually more powerful than a human.
All else being equal, I would doubt that with respect to a simpler ontology, while the ‘cleaner’ adjective is less well defined.
Look at it in terms of the number of possible circuit/program configurations that are “intellectually more powerful than a human” as a function of the circuit/program’s total bit size.
At around the human level of roughly 10^15 I’m almost positive there are intellectually more powerful designs—so P_SH(10^15) = 1.0.
I’m also positive that beyond some threshold there are absolutely zero possible configurations of superhuman intellect—say P_SH(10^10) ~ 0.0.
Of course “intellectually more powerful” is open to interpretation. I’m thinking of it here in terms of the range of general intelligence tasks human brains are specially optimized for.
IBM’s Watson is superhuman in a certain novel narrow range of abilities, and it’s of complexity around 10^12 to 10^13.
To get to that point we have to start from the right meaning to begin with, and care about preserving it accurately, and Jacob doesn’t agree those steps are important or particularly hard.
Not quite.
As for the start with the right meaning part, I think it is extremely hard to ‘solve’ morality in the way typically meant here with CEV or what not.
I don’t think that we need (or will) wait to solve that problem before we build AGI, any more or less than we need to solve it for having children and creating a new generation of humans.
If we can build AGI somewhat better than us according to our current moral criteria, they can build an even better successive generation, and so on—a benevolence explosion.
As for the second part about preserving it accurately, I think that ethics/morality is complex enough that it can only be succinctly expressed in symbolic associative human languages. An AGI could learn how to model (and value) the preferences of others in much the same way humans do.
Someone help me out. What is the right post to link to that goes into the details of why I want to scream “No! No! No! We’re all going to die!” in response to this?
Coming of Age sequence examined realization of this error from Eliezer’s standpoint, and has further links.
In which post? I’m not finding discussion about the supposed danger of improved humanish AGI.
That Tiny Note of Discord, say. (Not on “humanish” AGI, but eventually exploding AGI.)
I don’t see much of a relation at all to what i’ve been discussing in that first post.
[http://lesswrong.com/lw/lq/fake_utility_functions/] is a little closer, but still doesn’t deal with human-ish AGI.
Why would an AI which optimises for one thing create another AI that optimises for something else? Not every change is an improvement, but every improvement is necessarily a change. Building an AI with a different utility function is not going to satisfy the first AI’s utility function! So whatever AI the first one builds is necessarily going to either have the same utility function (in which case the first AI is working correctly), or have a different one (which is a sign of malfunction, and given the complexity of morality, probably a fatal one).
It’s not possible to create an AGI that is “somewhat better than us” in the sense that it has a better utility function. To the extent that we have a utility function at all, it would refer to the abstract computation called “morality”, which “better” is defined by. The most moral AI we could create is therefore one with precisely that utility function. The problem is that we don’t exactly know what our utility function is (hence CEV).
There is a sense in which a Friendly AGI could be said to be “better than us”, in that a well-designed one would not suffer from akrasia and whatever other biases prevent us from actually realizing our utility function.
AI’s without utility functions, but some other motivational structure, will tend to self-improve to a utility function AI. Utility-function AI’s seem more stable under self-improvement, but there are many reasons it might want to change its utility (eg speed of access, multi-agent situations).
Could you clarify what you mean by an “other motivational structure?” Something with preference non-transitivity?
For instance. http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf
It wouldn’t if it initially considered itself to be the only agent in the universe. But if it recognizes the existence of other agents and the impact of other agents’ decisions on its own utility, then there are many possibilities:
The new AI could be created as a joint venture of two existing agents.
The new AI could be built because the builder was compensated for doing so.
The new AI could be built because the builder was threatened into doing so.
This may seem intuitively obvious, but it is actually often false in a multi-agent environment.
Yes it certainly can—if that new AI helps it’s creator.
The same issue applies to children—they don’t necessarily have the same ‘utility function’, sometimes they even literally kill us, but usually they help us.
Sure it is—this part at least is easy. For example an AGI that is fully altruistic and only experiences love as it’s single emotion would be clearly “somewhat better than us” from our perspective in every sense that matters.
If that AGI would not be somewhat better than us in the sense of having a better utility function, then ‘utility function’ is not a useful concept.
The real problem is the idea that morality can or should be simplified down to a ‘utility function’ simple enough for a human to code.
Before tackling that problem, it would probably best to start with something much simpler, such as a utility function that could recognize dogs vs cats and other objects in images. If you actually research this it quickly becomes clear that real world intelligences make decisions using much more complexity than a simple utility-maximizing algorithm.
That would be not so much a benevolence explosion as a single AI creating “slave” AIs for its own purposes. If some of the child AI’s goals (for example those involved in being more good) are opposed to the parent’s goals (for example those which make the parent AI less good), the parent is not going to just let the child achieve its goals. Rational agents do not let their utility functions change.
If you mean that the AI doesn’t suffer from the akrasia and selfishness and emotional discounting and uncertainty about our own utility function which prevents us from acting out our moral beliefs then I agree with you. That’s the AI being more rational than us, and therefore better optimising for its utility function. But a literally better utility function is impossible, given that “better” is defined by our utility function.
Moreover, if our utility function describes what we truly want (which is the whole point of a utility function), it follows that we truly want an AI that optimizes for our utility function. If “better” were a different utility function then it would be unclear why we are trying to create an AI that does that, rather than what we want.
That’s why the plan is for the AI to figure it out by inspecting us. Morality is very much not simple to code.
So do we create children as our ‘slaves’ for our own purposes? You seem to be categorically ruling out the entire possibility of humans creating human-like AIs that have a parent-child relationship with their creators.
So just to make it precisely clear, I’m talking about that type of AI specifically. The importance and feasibility of that type of AGI vs other types is a separate discussion.
I don’t see it as having anything to do with rationality.
The altruistic human-ish AGI mentioned above would be better than current humans from our current perspective—more like what we wish ourselves to be, and more able to improve our world than current humans.
Yes.
This is obvious if it’s ‘utility function’ is just a projection of my own—ie it simulates what I would want and uses that as it’s utility function, but that isn’t even necessary—it’s utility function could be somewhat more complex than just a simulated projection of my own and still help fulfill my utility function.
If by inspection you just mean teach the AI morality in human language, then I agree, but that’s a side point.
So: I want to finish my novel, but I spend the day noodling around the Internet instead.
Then Omega hands me an AI which it assures me is programmed error-free to analyze me and calculate my utility function and optimize my environment in terms of it.
I run the AI, and it determines exactly which parts of my mind manifest a desire to finish the novel, which parts manifest a desire to respond to the Internet, and which parts manifest a desire to have the novel be finished. Call them M1, M2 and M3. (They are of course overlapping sets.) Then it determines somehow which of these things are part of my utility function, and which aren’t, and to what degree.
So...
Case 1: The AI concludes that M1 is part of my utility function and M2 and M3 are not. Since it is designed to maximize my utility, it constructs an environment in which M1 triumphs. For example, perhaps it installs a highly sophisticated filter that blocks out 90% of the Internet. Result: I get lots more high-quality work done on the novel. I miss the Internet, but the AI doesn’t care, because that’s the result of M2 and M2 isn’t part of my utility function.
Case 2: The AI concludes that M3 and M2 are part of my utility function and M1 is not, so it finishes the novel itself and modifies the Internet to be even more compelling. I miss having the novel to work on, but again the AI doesn’t care.
Case 3: The AI concludes that all three things are part of my utility function. It finishes the novel but doesn’t tell me about it, thereby satisfying M3 (though I don’t know it). It makes a few minor tweaks to my perceived environment, but mostly leaves them alone, since it is already pretty well balanced between M1 and M2 (which is not surprising, since I was responding to those mental structures when I constructed my current situation).
If I’m understanding you correctly, you’re saying that I can’t really know which of these results (or of countless other possibilities) will happen, but that whichever one it is, I should have high confidence that all other possibilities would by my own standards have been worse… after all, that’s what it means to maximize my utility function.
Yes?
It seems to follow that if the AI has an added feature whereby I can ask it to describe what it’s about to do before it does it and then veto doing it, I ought not invoke that feature. (After all, I can’t make the result better, but I might make the result worse.)
Yes?
Assuming you trust Omega to mean the same thing as you do when talking about your preferences and utility function, then yes. If the AI looks over your mind and optimizes the environment for your actual utility function (which could well be a combination of M1, M2 and M3), then any veto you do must make the result worse than the optimal one.
Of course, if there’s doubt about the programming of the AI, use of the veto feature would probably be wise, just in case it’s not a good genie.
You seem to be imagining a relatively weak AI. For instance, given the vast space of possibilities, there are doubtlessly environmental tweaks that would result in more fun on the internet and more high-quality work on the novel. (This is to say nothing of more invasive interventions.)
The answer to your questions is yes: assuming the AI does what Omega says it does, you won’t want to use your veto.
Not necessarily weak overall, merely that it devotes relatively few resources to addressing this particular tiny subset of my preference-space. After all, there are many other things I care about more.
But, sure, a sufficiently powerful optimizer will come up with solutions so much better that it will never even occur to me to doubt that all other possibilities would be worse. And given a sufficiently powerful optimizer, I might as well invoke the preview feature if I feel like it, because I’ll find the resulting preview so emotionally compelling that I won’t want to use my veto.
That case obscures rather than illustrates the question I’m asking, so I didn’t highlight it.
Case 4: The AI makes tweaks to your current environment in order to construct it in accordance with your mental structures, but in a way more efficient than you could have in the first place.
Sure. In which case I still noodle around on the Internet a bunch rather than work on my novel, but at least I can reassure myself that this optimally reflects my real preferences, and any belief I might have that I would actually rather get more work done on my novel than I do is simply an illusion.
If those are, in fact, your real preferences, then sure.
I occasionally point out that you can model any computable behaviour using a utility-maximizing algorithm, provided you are allowed to use a partially-recursive utility function.
Please read the sequences, and stop talking about AI until you do.
I’ve read the sequences. Discuss or leave me alone.
Thanks, that’s useful to know.
Edit: Seriously, no irony, that’s useful. Disagreement should be treated differently depending on background.
Also, very little of the sequences have much of anything to do with AI. If I want to learn more about that I would look to Norvig’s book or more likely the relevant papers online. No need to be rude just because I don’t hold all your same beliefs.
It’s more of a problem with your understanding of ethics, as applied to AI (and since this is the main context in which AI is discussed here, I referred to that as simply AI). You might be very knowledgeable in contemporary machine learning or other AI ideas while not seeing, for example, the risks of building AGIs.
Unfortunately there is (in some senses of “rude”, such as discouraging certain conversational modes).
I see the potential risks in building AGIs.
I don’t see that risk being dramatically high for creating AGIs based loosely on improving the human brain, and this approach appears to be mainstream now or becoming the mainstream (Kurzweil, Hawkins, Darpa’s neuromorphic initiative, etc).
I’m interested in the serious discussion or analysis of why that risk could be high.
You have been discussing favourably the creation of AGIs that are programmed to create AGIs with different values to their own. No, you do not understand the potential risks.
We create children that can have different values than our own, and over time this leads to significant value drift. But perhaps it should be called ‘value evolution’.
This process is not magically guaranteed to preserve our best interests from our current perspective when carried over to AGI, but nor is guaranteed to spontaneously destroy the world.
Your analogy with evolution is spot on: if the values are going to drift at all, we want to drift towards some target point, by selecting against sub-AIs that have values further from the point.
However, if we can do that, why not just put that target point right in the first AI’s utility function, and prevent any value drift at all? It seems like it ends up with the same result, but with slightly less complication.
And, if we can’t set a target point for the value drift evolution… then it might drift anywhere at all! The chances that it would drift somewhere we’d like are pretty small. This applies even if it were a human-brain-based AGI; in general people are quite apt to go corrupt when given only a tiny bit of extra power. A whole load of extra power, like superintelligence would grant, would have a good chance of screwing with that human’s values dramatically, possibly with disastrous effects.
Yes.
The true final ‘target point’ is unknown, and unknowable in principle. We don’t have the intelligence/computational power right now to know it, no AGI we can build will know it exactly, and this will forever remain true.
Our values are so complex that the ‘utility function’ that describes them is our entire brain circuit—and as we evolve into more complex AGI designs our values will grow in complexity as well.
Fixing them completely would be equivalent to trying to stop evolution. It’s pointless, suicidal, impossible.
Yes evolution could in principle take us anywhere, but we can and already do exert control over it’s direction.
Humans today have a range of values, but an overriding universal value is not-dying. To this end it is crucially important that we reverse engineer the human mind.
Ultimately if what we really value is conscious human minds, and computers will soon out-compete human brains, then clearly we need to transfer human minds over to computers.
One simple point is that there is no reason to expect AGIs to stop at exactly human level. Even if progress and increase in intelligence is very slow, eventually they become an existential risk, or at least a value risk. Every step in that direction we make now is a step in the wrong direction, which holds even if you believe it’s a small step.
This isn’t the first time I heard this, but I don’t think it’s exactly right.
We know that human level is possible, but while super human level being possible seems overwhelmingly likely from considerations like imagining a human with more working memory and running faster we don’t technically know that.
We have a working example of a human level intelligence.
It’s human level intelligences doing the work. Martians work on AI might asymptotically slow down when approaching martian level intelligence without that level being inherently significant for anyone else, and the same for humans, or any AGI of any level working on its own successor for that matter (not that I have any strong belief that this is the case, it’s just an argument for why human level wouldn’t be completely arbitrary as a slow down point)
I’d completely agree with “there is no strong reason to expect AGIs to stop at exactly human level”, “High confidence* in AGIs stopping at exactly human level is irrational” or “expecting AGIs not to stop at exactly human level would be prudent.”
*Personally I’d assign a probability of under 0.2 to the best AGI’s being on a level roughly comparable to human level (let’s say being able to solve any problem except human relationship problems that every IQ 80+ human can solve, but not being better at every task than any human) for at least 50 years (physical time in Earth’s frame of reference, not subjective time; probably means inferior at an equal clock rate but making up for that with speed for most of that time). That’s a lot more than I would assign any other place on the intelligence scale of course.
Could the downvoter please say what they are disagreeing with? I can see at least a dozen mutually contradictory possible angles so “someone thinks something about posting this is wrong” provides almost no useful information.
Thanks for the value risk link—that discussion is what I’m interested in.
I guess I’ll reply to it there. The initial quotes from Ben G. and Hanson are similar to my current view.
There is some discussion of the dangers of a uFAI Singularity, particularly in this debate between Robin Hanson and Eliezer. Much of the danger arises from the predicted short time period required to get from a mere human-level AI to a superhuman AI+. Eliezer discusses some reasons to expect it to happen quickly here and here. The concept of a ‘resource overhang’ is crucial in dismissing Robin’s skepticism (which is based on historical human experience in economic growth—particularly in the accumulation of capital).
For an analysis of the possibility of a hard takeoff in approaches to AI based loosely on modeling or emulating the human brain, see this posting by Carl Schulman, for example.
If civilisation(t+1) can access resources much better than civilisation(t), then that is just another way of saying things are going fast—one must beware of assuming what one is trying to demonstrate here.
The problem I see with this thinking is the idea that civilisation(t) is a bunch of humans while civilisation(t+1) is a superintelligent machine.
In practice, civilisation(t) is a man-machine symbiosis, while civilisation(t+1) is another man-machine symbiosis with a little bit less man, and a little bit more machine.
There are some promising lines of attack (grounded in decision theory) that might take only a few years of research. We’ll see where they lead. Other open problems in FAI might start looking very solvable if we start making progress on this front.
Show me.
PM’d.
Yes. :)
Yes, but it still has to be explicitly programmed to do that! The question is how to get it to do so. AFAIK shaper-anchor semantics is still quite a ways from being fully specified, but it seems the bigger obstacle is that an AI writer is less likely than not to take the effort to program it that way in the first place.
This is surely the kind of thing that superintelligences will be good at. They will have access to every paperclip picture on the net, every paperclip specification too. They will surely have a much clearer idea about what a paperclip is than humans do. They will know what boxes are too.
I made a stab at it here, and it got some upvotes. So here’s a repost:
Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
(Please let me know if reposting violates LW ettiquette so I know not to do it again.)
I don’t think it violates LW etiquette.
Here’s a sort of fully general counterargument against proposals to naturalize human concepts in AI terms: if you can naturalize human concepts, you should be able to naturalize the human concept of a box. And if you can do that, we can build Oracle AI and save the world. It’s very easy to describe what we mean by ‘stay in the box’, but it turns out that seed (self-modifying!) AIs just don’t have a natural ontology for the descriptions.
This argument might be hella flawed; it seems kind of tenuous.
Aren’t you simply assuming that the world is doomed here? It sure looks like it!
Since when is that assumption part of a valid argument?
That assumption isn’t really a core part of the argument… the general “if specifying human concepts is easy, then come up with a plan for making a seed AI want to stay in a box” argument still stands, even if we don’t actually want to keep arbitrary seed AIs in boxes.
For the record I am significantly less certain than most LW or SIAI singularitarians that seed AIs not explicitly coded with human values in mind will end up creating a horrible future, or at least a more horrible future than something like CEV. I do think it’s worth a whole lot of continued investigation.
I suspect that you mean something like:
If there is an objective universal morality then agents converge on this universal at the limits of intelligence.
And thus perhaps paperclip maximizers have a tendency to become something else.