Evidence for the orthogonality thesis
One of the most annoying arguments when discussing AI is the perennial “But if the AI is so smart, why won’t it figure out the right thing to do anyway?” It’s often the ultimate curiosity stopper.
Nick Bostrom has defined the “Orthogonality thesis” as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We’re trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.
I’m hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I’m asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who’s caught a bad case of moral realism—what would you say to them?
Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.
I’ve had several conversations that went like this:
Victim: But surely a smart artificial intelligence will be able to tell right from wrong, if we humans can do that?
Me: Forget about the word “intelligence” for a moment. Imagine a machine that looks at all actions in turn, and mechanically chooses the action that leads to producing the greatest number of paperclips, in whichever way possible. With enough computing power and enough knowledge about the outside world, the machine might find a way to convert the whole world into a paperclip factory. The machine will resist any attempts by humans to interfere, because the machine’s goal function doesn’t say anything about humans, only paperclips.
Victim: But such a machine would not be truly intelligent.
Me: Who cares about definitions of words? Humanity can someday find a way to build such a machine, and then we’re all screwed.
Victim: …okay, I see your point. Your machine is not intelligent, but it can be very dangerous because it’s super-efficient.
Me (under my breath): Yeah. That’s actually my definition of “superintelligent”, but you seem to have a concept of “intelligence” that’s entangled with many accidental facts about humans, so let’s not go there.
Maybe we should stop calling it AI and start calling it an outcome pump.
Or a really powerful optimization process.
Assuming the conclusion.
That’s ‘trying to find a name to use that isn’t as loaded with muddled connotations than AI is’. Ciphergoth doesn’t actually conclude anything. He puts forward a concept that potentially future conclusions could be made (or assumed) about.
The concept is itself a conclusion. Ciphergoth puts forth the concept without supporting arguments. Thus he assumes the conclusion. Now, maybe it’s useful to say, “hey, we’ve already derived a cool name for our conclusion: ‘really powerful optimization process’”, and that’s what ciphergoth is doing; but the conclusion is not convincingly argued for anywhere (the arguments are mostly assumptions of the conclusion), and so putting it forth without new arguments is assuming the conclusion.
Introducing the notion of the Moon being made of Gouda doesn’t assume any conclusion. You are being destructive again by not communicating clearly.
If the question is, “Is the Moon made of Gouda?”, and someone puts forth the argument that “The Moon is almost certainly made of Gouda”, how is that not assuming the conclusion? Proposing calling ‘A[G]I’ (notice the equivocation on AI and AGI) a “really powerful optimization process” is like saying “we shouldn’t call it ‘Moon’, ‘Moon’ is too vague; we should call it Giant Heavenly Gouda Ball”. How is that not assuming the conclusion? Especially when the arguments for naming it ‘Giant Heavenly Gouda Ball’ amount to “we all agree it’s Giant, Heavenly, and a Ball, and it’s intuitively obvious that even though the Earth isn’t made of Gouda and we’ve never actually been to the ‘Moon’, the ‘Moon’ is almost certainly made of Gouda”.
Repeatedly bringing up the conclusion in different words as if doing so constituted an argument is actively harmful. This stresses me out a lot, and that’s why I’m being destructive. Even if reasserting the conclusion as an argument is not what ciphergoth intended, you must know that’s how it will be taken by the majority of LessWrongers and, more importantly, third parties who will be introduced to AI risk by those LessWrongers, e.g. Stuart.
Nonetheless, I am receptive to your claim of destruction, and will try to adjust my actions accordingly.
I strongly agree on both counts.
I would agree that introducing a concept has connotations of considering the hypothesis that an instantiation of the concept exists or is possible, and without sufficient evidence to support the complexity of the concept, this is privileging the hypothesis, which is close enough to assuming the conclusion. However, it seems weird to make this criticism of “outcome pump” and “really powerful optimization process” and not make the same criticism of “artificial intelligence” when the former are attempts to avoid bad assumptions from connotations of the latter. When “intelligence” makes people think it must be human like, this makes “powerful artificial intelligence” a strictly more specific concept than “powerful optimization process”.
I was under the impression that “artificial intelligence” is meant to differentiate human and machine “intelligence” along technical lines, not moral ones, i.e., to emphasize that they solve problems in technically different ways. “Outcome pump” and “really powerful optimization process” are meant to differentiate human and non-human “intelligence” along moral lines; the justification for this distinction is much less clear-cut. I don’t criticize “artificial intelligence” much because it’s empirically and theoretically demonstrable that humans and machines solve problems in technically different ways, which distinction I thought was the purpose of the term to make.
Does “really powerful optimization process” differentiate at all? Humans are powerful optimization processes too.
True, it only differentiates by connotational reference to standard SingInst arguments. Outside of that context it might be a useful phrase. It’s just that the only reason people use the term is because they allege that “AI” has certain unfortunate connotations, but their arguments for why those connotations are unfortunate are hidden and inconclusive, and so suggesting “really powerful optimization process” instead of “AI” seems to an impartial observer like a sneaky and un-called-for attempt to shift the burden of proof and the frame of the debate. I’m too steeped in SingInst arguments and terminology to know if that’s how it will come across to outsiders; my fear that it’ll come across that way might be excessive.
I can’t speak for others here, but one reason I’ve taken to talking about optimizers rather than intelligences in many cases is because while I’m fairly confident that all intelligences are optimizers, I’m not sure that all optimizers are intelligences, and many of the things I intuitively want to say about “intelligences” it turns out, on consideration, I actually believe about optimizers. (In other cases I in fact turn out to believe them about intelligences, and I use that word in those case.)
I have a pretty good idea what is meant by optimizer. In so far as “intelligence” doesn’t mean the same thing, I don’t know what it means.
Yup, I’m not sure I do either. It’s clear to me that “intelligence” has connotations that “optimizer” lacks, with the result that the former label in practice refers to a subset of the latter, but connotations are notoriously difficult to pin down precisely. One approximation is that “intelligent” is often strongly associated with human-like intelligence, so an optimizer that is significantly non-human-like in a way that’s relevant to the domain of discourse is less likely to be labelled “intelligent.”
It seems to me that optimizer got a lot of connotations with specific architecture in mind (unworkable architecture too for making it care to do things in real world).
Interesting. What specific architectural connotations do you see?
Well, the way I see it, you take the possibility of the AI that just e.g. maximizes the performance of airplane wing inside a fluid simulator (by conducting a zillion runs of this simulator), and then after a bit of map-territory confusion and misunderstanding of how the former optimizer works, you equate this with optimizing some real world wing in the real world (without conducting a zillion trials in the real world, evolution style). The latter has the issue of symbol grounding, and of building a model of the world, and optimizing inside this model, and then building it in the real world, et cetera.
Interesting. It would never have occurred to me to assume that “optimizer” connotes a trial-and-error brute-force-search architecture of this sort, but apparently it does for at least some listeners. Good to know. So on balance do you endorse “intelligence” instead, or do you prefer some other label for a process that modifies its environment to more effectively achieve a pre-determined result?
That is the issue, you assume the conclusion. Let’s just call it scary AI, and agree that scary AI is, by definition, scary.
Then let’s move on to actual implementations other than bruteforce nonsense, the actual implementations that need to build a model of the world, and have to operate on basis of this model, rather than the world itself (excluding e.g. evolution that doesn’t need this), processes which may or may not be scary.
Certainly agreed that it’s more useful to implement the thing than to label it. If you have ideas for how to do that, by all means share them. I suggest you do so in a new post, rather than in the comments thread of an unrelated post.
To the extent that we’re still talking about labels, I prefer “optimizer” to “scary AI,” especially when used to describe a class that includes things that aren’t scary, things that aren’t artificial, and things that are at-least-not-unproblematically intelligent. Your mileage may vary.
The phrase “assuming the conclusion” is getting tossed around an awful lot lately. I’m at a loss for what conclusion I’m assuming in the phrase you quote, or what makes that “the issue.” And labelling the whole class of things-we’re-talking-about as “scary AIs” seems to be assuming quite a bit, so if you meant that as an alternative to assuming a conclusion, I’m really bewildered.
Agreed that the distinction you suggest between model-based whatever-they-ares and non-model-based whatever-they-ares is a useful distinction in a lot of discussions.
None of the existing things described as intelligent match your definition of intelligence, and of the hypothetical, just the scary and friendly AIs do (i see the friendly AI as a subtype of scary AI).
Evolution: Doesn’t really work in direction of doing anything pre-defined to environment. Mankind: ditto, go ask ancient Egyptians what exactly we are optimizing about environment or what pre-determined result we were working towards. Individual H. Sapiens: some individuals might do something like that, but not very close. Narrow AIs like circuit designers, airplane wing optimizers, and such: don’t work on the environment.
Only the scary AI fits your definition here. That’s part of why this FAI effort and the scary-AI scare is seen as complete nonsense. There isn’t a single example of general intelligence that works by your definition, natural or otherwise. Your definition of intelligence is narrowed down to the tiny but extremely scary area right near the FAI, and it excludes all the things anyone normally describes as intelligent.
I haven’t offered a definition of intelligence as far as I know, so I’m a little bewildered to suddenly be talking about what does or doesn’t match it.
I infer from the rest of your comment that what you’re taking to be my definition of intelligence is “a process that modifies its environment to more effectively achieve a pre-determined result”, which I neither intended nor endorse as a definition of intelligence.
That aside, though, I think I now understand the context of your initial response… thanks for the clarification. It almost completely fails to overlap with the context I intended in the comment you were responding to.
Well, the point is that if we start to not using the ‘intelligence’ to describe it, but using the ‘really powerful optimization process’ or the like, it gets us to things like:
“a process that modifies its environment to more effectively achieve a pre-determined result”
which are a very apt description of scary AI, but not of anything which is normally described as intelligent. This way the scary AI has more in common with gray goo than with anything that is normally described as intelligent.
I infer from this that your preferred label for the class of things we’re talking around is “intelligence”. Yes?
Edit: I subsequently infer from the downvote: no. Or perhaps irritation that I still want my original question answered. Edit: Repudiating previous edit.
I didn’t downvote this comment.
The preferred label for things like seed AI, or giant neural network sim, or the like, should be intelligence unless they are actually written as a “really powerful optimization process”, in which case it is useful to refer to what exactly they optimize (which is something within themselves, not outside). The scary AI idea arises from lack of understanding of what the intelligences do, and latching onto the first plausible definition, them optimizing towards a goal defined from the start. It may be a good idea to refer to the scary AIs as really powerful optimization processes that optimize the real world towards some specific state, but don’t confuse this with intelligences in general, of which this is a tiny, and so far purely theoretical, subset.
OK, cool.
So, suppose I am looking at a system in the world… call it X. Perhaps X is a bunch of mold in my refrigerator. Perhaps it’s my neighbor’s kid. Perhaps it’s a pile of rocks in my backyard. Perhaps it’s a software program on my computer. Perhaps it’s something on an alien planet I’m visiting. Doesn’t matter.
Suppose I want to know whether X is intelligent.
What would you recommend I pay attention to in X’s observable behavior in order to make that decision? That is, what observable properties of X are evidence of intelligence?
Well, if you observe it optimizing something very powerfully, it might be intelligent or it might be a thermostat with high powered heater and cooler and a PID controller. I define intelligence as capability of solving problems, which is about choosing a course of action out of a giant possible space of courses of action based on some sort of criteria, where normally there is no obvious polynomial time solution. One could call it ‘powerful optimization process’ but that brings the connotations of the choice having some strong effect on the environment (which you yourself mentioned), while one could just as well proposition an agent whose goal includes preservation of status quo (i.e. the way things would have been without it) and minimization of it’s own impact, to the detriment of other goals that appeal more to us—and that agent could still be very intelligent even though it’s modification to the environment would be smaller than that by some dumber agent working under exact same goals with exact same weights (as the smarter agent processes larger space of possible solutions and find the solutions that can satisfy both goals better; the agents may have identical algorithm with different CPU speed and the faster may end up visibly ‘optimizing’ its environment less).
edit: I imagine there can be different definitions of intelligence. The Eliezer grew up in a religious family, is himself an atheist, and seem to see the function of intelligence as primarily forming the correct beliefs; something that I don’t find so plausible given that there are very intelligent people who seem to believe in some really odd things, which are so defined as to have no impact on their life. That’s similar to belief that all behaviours in MWI should equate to normality, while believing in MWI. There are also intelligent people whose strange beliefs have impact on their life. The one thing common about people i’d call intelligent, is that they are good at problem solving. The problems being solved vary, and some people’s intelligence is tasked with making un-falsifiable theory of dragon in their garage. Few people would think of this behaviour when they hear phrase ‘optimization process’. But the majority of intelligent people’s intelligence is tasked with something very silly most of the time.
OK.
So, echoing that back to you to make sure I understand so far: one important difference between “intelligence” and “optimization process” is that the latter (at least connotatively) implies affecting the environment whereas the former doesn’t. We should be more concerned with the internal operations of the system than with its effects on the environment, and therefore we should talk about “intelligence” rather than “optimization process.” Some people believe “intelligence” refers to the ability to form correct beliefs, but it properly refers to the ability to choose a specific course of action out of a larger space of possibilities based on how well it matches a criterion.
Is that about right, or have I misunderstood something key?
Well, the ‘optimization process’ has the connotations of making something more optimal, the connotations of certain productivity and purpose. Optimal is a positive word. The intelligence, on the other hand, can have goals even less productive than tiling the universe with paperclips. The problem with word intelligence is that it may or may not have positive moral connotations. The internal operation shouldn’t really be very relevant in theory, but in practice, you can have a dumb brick that is just sitting here, and you can have a brick of computronium inside which entire boxed society lives which for some reason decided to go solipsist and deny the outside of the brick. Or you can have a brick that is sitting plotting to take over the world, but it didn’t make a single move yet (and is going to chill out for another million years coz it got patience and isn’t really in a hurry because the goal is bounded, and its e.g. safely in orbit).
If you start talking about powerful optimization processes that do something in the real world, you leave out all the simple, probable, harmless goal systems that the AI can have (and still be immensely useful). The external goals are enormously difficult to define on a system that builds it’s own model of the world.
Agreed that “optimization process” connotes purpose and making something more optimal in the context of that purpose.
Agreed that “optimal” has positive connotations.
Agreed that an intelligence can have goals that are unproductive, in the colloquial modern cultural sense of “unproductive”.
Agreed that “intelligence” may or may not have positive moral connotations.
Agreed that internal operations that don’t affect anything outside the black box are of at-best-problematic relevance to anything outside that box.
Completely at a loss for how any of that relates to any of what I said, or answers my question.
I think I’m going to tap out of the conversation here. Thanks for your time.
Well, I think you understood what I meant, it’s just felt as you made a short summary partially out of context. People typically (i.e. virtually always) do that for purpose of twisting other people’s words later on. The arguments over definitions are usually (virtually always) a debate technique designed to obscure the topic and substitute some meanings to edge towards some predefined conclusion. In particular most typically one would want to substitute the ‘powerful optimization process’ for intelligence to create the support for the notion of scary AI.
I do it, here and elsewhere, because most of your comments seem to me entirely orthogonal to the thing they ostensibly respond to, and the charitable interpretation of that is that I’m failing to understand your responses the way you meant them, and my response to that is typically to echo back those responses as I understood them and ask you to either endorse my echo or correct it.
Which, frequently, you respond to with a yet another comment that seems to me entirely orthogonal to my request.
But I can certainly appreciate why, if you’re assuming that I’m trying to twist your words and otherwise being malicious, you’d refuse to cooperate with me in this project.
That’s fine; you’re under no obligation to cooperate, and your assumption isn’t a senseless one.
Neither am I under any obligation to keep trying to communicate in the absence of cooperation, especially when I see no way to prove my good will, especially given that I’m now rather irritated at having been treated as malicious until proven otherwise.
So, as I said, I think the best thing to do is just end this exchange here.
Not really as malicious, just it is an extremely common pattern of behaviour. People are goal driven agents and their reading is also goal driven, picking the meanings for the words as to fit some specific goal, which is surprisingly seldom understanding. Especially in a charged issue like risks of anything, where people typically choose their position via some mix of their political orientation, cynicism, etc etc etc then defend this position like a lawyer defending a client. edit: I guess it echoes the assumption that AI typically isn’t friendly if it has pre-determined goals that it optimizes towards. People typically do have pre-determined goals in discussion.
Sure. And sometimes those goals don’t involve understanding, and involve twisting other people’s words, obscuring the topic, and substituting meanings to edge the conversation towards a predefined conclusion, just as you suggest. In fact, that’s not uncommon. Agreed.
If you mean to suggest by that that I ought not be irritated by you attributing those properties to me, or that I ought not disengage from the conversation in consequence, well, perhaps you’re right. Nevertheless I am irritated, and am consequently disengaging.
Just by me, right? I deliberately used it like fifty times. (FWIW I’m not sure but I think Dmytry misunderstood you somewhere/somehow.)
I don’t know; I tend to antikibbitz unless I’m involved in the conversation. This most recent time was Dmytry, certainly. The others may have been you.
And I’m not really sure what’s going on between me and Dmytry, really, though we sure do seem to be talking at cross-purposes. Perhaps he misunderstood me, I don’t know.
That said, it’s a failure mode I’ve noticed I get into not-uncommonly. My usual reaction to a conversation getting confusing is to slow all the way down and take very small steps and seek confirmation for each step. Usually it works well, but sometimes interlocutors will neither confirm my step, nor refute it, but rather make some other statement that’s just as opaque to me as the statement I was trying to clarify, and pretty soon I start feeling like they’re having a completely different conversation to which I haven’t even been invited.
I don’t know a good conversational fix for this; past a certain point I tend to just give up and listen.
Assuming the conclusion.
Go define a paperclip maximizer, or anything at all real maximizer, for a machine that has infinite computing power (and with which one can rather easily define a superhuman, fairly general AI). Your machine has senses but doesn’t have real-world paperclip counter readily given to it.
You make one step in the right direction, that the intelligence does not necessarily share our motivation, and then make a dozen steps backwards when you anthropomorphize that it will actually care about something real just like we do—that the intelligence will necessarily be motivable, for lack of better word, just like humans are.
If you vaguely ask AI to make vague paperclips, the AI got to understand human language, understand your intent, etc. to actually make paperclips rather than say put one paperclip in a mirror box and proclaim “infinitely many paperclips created” (or edit itself and replace some of the if statements so that it is as if there were infinitely many paperclips, or any other perfectly legitimate solution). Then you need a very narrow range of bad understandings, for the AI to understand that the statement means converting universe into paperclips, but not understand that it is also implied that you only need as many paperclips as you want, that you don’t want quark sized paperclips, et cetera.
“Motivability” seems to be a red herring. When we get the first AI capable of strongly affecting the real world, what makes you privilege the hypothesis that the AI’s actions and mistakes will be harmless to us?
If some misguided FAI fool manages to make an AI that has it’s goals somehow magically defined in the territory rather than in the map, in non-wireheadable way, then yes, it may be extremely harmful.
Just about everyone else who’s working on neat AIs (the practical ones), have the goals defined on the internal representations, and as such the wireheading is a perfectly valid perfect solution to the goals. The AI is generally prevented from wireheading itself via constraints, but in so much as the AI has a desire, that’s the desire to wirehead.
If the AI’s map represents the territory accurately enough, the AI can use the map to check the consequences of returning different actions, then pick one action and return it, ipso facto affecting the territory. I think I already know how to build a working paperclipper in a Game of Life universe, and it doesn’t seem to wirehead itself. Do you have a strong argument why all non-magical real-world AIs will wirehead themselves before they get a chance to hurt humans?
Eurisko is an important datum.
This isn’t quite an AGI. In particular, it doesn’t even take input from its surroundings.
Fair enough. We can handwave a little and say that AI2 built by AI1 might be able to sense things and self-modify, but this offloading of the whole problem to AI1 is not really satisfying. We’d like to understand exactly how AIs should sense and self-modify, and right now we don’t.
Let it build a machine that takes input from own surroundings.
But the new machine can’t self-modify. My point is about the limitations of cousin_it’s example. The machine has a completely accurate model of the world as input and uses an extremely inefficient algorithm to find a way to paperclip the world.
The second machine can be designed to build a third machine, based on the second machine’s observations.
Yes, but now the argument that you will converge to a paper clipper is much weaker.
Perhaps it’s also worth bringing up the example of controllers, which don’t wirehead (or do they, once sufficiently complex?) and do optimize the real world. (Thermostats confuse me. Do they have intentionality despite lacking explicit representations? (FWIW Searle told me the answer was no because of something about consciousness, but I’m not sure how seriously he considered my question.))
You are looking for intentionality in the wrong place. Why do thermostats exist? Follow the improbability.
Yes, actual thermostats got their shard of the Void from humans, just as humans got their shard of the Void from evolution. (I’d say “God” and not “the Void”, but whatever.) But does evolution have intentionality? The point is to determine whether or not intentionality is fundamentally different from seemingly-simpler kinds of optimization—and if it’s not, then why does symbol grounding seem like such a difficult problem? …Or something, my brain is too stressed to actually think.
Taboo “intentionality”.
Yes, discerning the hidden properties of “intentionality” is the goal which motivates looking at the edge case of thermostats.
I don’t see why it doesn’t seem to wirehead itself, unless for some reason the game of life manipulators are too clumsy to send a glider to achieve the goal by altering the value within the paperclipper (e.g. within it’s map). Ultimately the issue is that the goal is achieved when some cells within paperclipper which define the goal acquire certain values. You need to have rather specific action generator so that it avoids generating the action that changes the cells within paperclipper. Can you explain why this solution would not be arrived at? Can your paperclipper then self improve if it can’t self modify?
I do imagine that very laboriously you can manage to define some sort of paperclipping goal (maximize number of live cells?), on the AI into which you, by hand, hard coded complete understanding of game of life, and you might be able to make it not recognize sending of the glider into the goal system and changing it as ‘goal accomplished’. The issue is not whenever it’s possible (I can make a battery of self replicating glider guns and proclaim them to be an AI), the issue is whenever it is at all likely to happen without immense lot of work implementing much of the stuff that the AI ought to learn, into the AI, by hand. Ultimately with no role for AI’s intelligence as intelligence amplifier, but only as obstacle that gets in your way.
Furthermore, keep in mind that the AI’s model of game of life universe is incomplete. The map does not represent territory accurately enough, and can not, as the AI occupies only a small fraction of the universe, and encodes the universe into itself very inefficiently.
The paperclipper’s goal is not to modify the map in a specific way, but to fill the return value register with a value that obeys specific constraints. (Or to zoom in even further, the paperclipper doesn’t even have a fundamental “goal”. The paperclipper just enumerates different values until it finds one that fits the constraints. When a value is found, it gets written to the register, and the program halts. That’s all the program does.) After that value ends up in the register, it causes ripples in the world, because the register is physically connected to actuators or something, which were also described in the paperclipper’s map. If the value indeed obeys the constraints, the ripples in the world will lead to creating many paperclips.
Not sure what sending gliders has to do with the topic. We’re talking about the paperclipper wireheading itself, not the game manipulators trying to wirehead the paperclipper.
Incompleteness of the model, self-modification and other issues seem to be red herrings. If we have a simple model where wireheading doesn’t happen, why should we believe that wireheading will necessarily happen in more complex models? I think a more formal argument is needed here.
You don’t have simple model where wireheading doesn’t happen, you have the model where you didn’t see how the wireheading would happen by the paperclipper, erhm, touching itself (i.e. it’s own map) with it’s manipulators, satisfying the condition without filling universe with paperclips.
edit: that is to say, the agent which doesn’t internally screw up it’s model, can still e.g. dissolve the coat off a ram chip and attach a wire there, or failing that, produce the fake input for it’s own senses (which we do a whole lot).
Maybe you misunderstood the post. The paperclipper in the post first spends some time thinking without outputting any actions, then it outputs one single action and halts, after which any changes to the map are irrelevant.
We don’t have many models of AIs that output multiple successive actions, but one possible model is to have a one-action AI whose action is to construct a successor AI. In this case the first AI doesn’t wirehead because it’s one-action, and the second AI doesn’t wirehead because it was designed by the first AI to affect the world rather than wirehead.
What makes it choose the action that fills universe with paperclips over the action that makes the goal be achieved by modification to the map? edit: or do you have some really specialized narrow AI that knows nothing whatsoever of itself in the world, and simply solves the paperclip maximization in sandbox inside itself (sandbox where the goal is not existing), then simple mechanisms make this action happen in the world?
edit: to clarify. What you don’t understand is that wireheading is a valid solution to the goal. The agent is not wireheading because it makes it happy, it’s wireheading because wireheading really is the best solution to the goal you have given to it. You need to jump through hoops to make the wireheading not be a valid solution from the agent’s perspective. You not liking it as solution does not suffice. You thinking that it is fake solution does not suffice. The agent has to discard that solution.
edit: to clarify even further. When evaluating possible solutions, agent comes up with an action that makes a boolean function within itself return true. That can happen if the function, abstractly defined, in fact return true, that can happen if an action modifies the boolean function and changes it to return true , that can happen if the action modifies inputs to this boolean function to make it return true.
Yes. Though the sandbox is more like a quined formal description of the world with a copy of the AI in it. The AI can’t simulate the whole sandbox, but the AI can prove theorems about the sandbox, which is enough to pick a good action.
So, it proves a theorem that if it creates a glider in such and such spot, so and so directed, then [the goal definition as given inside the AI] becomes true. Then it creates that glider in the real world, the glider glides, and hits straight into the definition as given inside the AI making it true. Why is this invalid solution? I know it’s not what you want it to do—you want it to come up with some mega self replicating glider factory that will fill the universe with paperclips. But it ain’t obligated to do what you want.
The AI reasons with its map, the map of the world. The map depicts events that happen in the world outside of AI, and it also depicts the events that happen to the AI, or to AI’s map of the world. In AI’s map, an event in the world and AI map’s picture of that event are different elements, just as they are different elements of the world itself. The goal that guides AI’s choice of action can then distinguish between an event in the world and AI map’s representation of that event, because these two events are separately depicted in its map.
Can it however distinguish between two different events in the world that result in same map state?
edit: here, example for you. For you, some person you care about, has same place in map even though the atoms get replaced etc. If that person gets ill, you may want to mind upload that person, into an indistinguishable robot body, right? You’ll probably argue that it is a valid solution to escaping death. A lot of people have different map, and they will argue that you’re just making a substitute for your own sake, as the person will be dead, gone forever. Some other people got really bizarre map where they are mapping ‘souls’ and have the person alive in the ‘heaven’, which is on the map. Bottom line is, everyone’s just trying to resolve the problem in the map. In the territory, everyone is gone every second.
edit: and yes, you can make a map which will distinguish between sending a glider that hits the computer, and making a ton of paperclips. You still have a zillion world states, including those not filled with paperclips, mapping to the same point in map as the world filled with paperclips. Your best bet is just making the AI narrow enough that it can only find the solutions where the world is filled with paperclips.
I don’t know, the above reads to me as “Everything is confusing. Anyway, my bottom line is .” I don’t know how to parse this as an argument, how to use it to make any inferences about .
The purpose of the grandparent was to show that it’s not in principle problematic to distinguish between a goal state and that goal state’s image in the map, so there is no reason for wireheading to be consequentialistically appealing, so long as an agent is implemented carefully enough.
Because the AI’s goal doesn’t refer to a spot inside the computer running the AI. The AI just does formal math. You can think of the AI as a program that stops when it finds an integer N obeying a certain equation. Such a program won’t stop upon finding an integer N such that “returning N causes the creation of a glider that crashes into the computer and changes the representation of the equation so that N becomes a valid solution” or whatever. That N is not a valid solution to the original equation, so the program skips it and looks at the next one. Simple as that.
First, you defined the equation so that it included the computer and itself (that simulator it uses to think, and also self improve as needed).
Now you are changing the definitions so that the equation is something else. There’s a good post by Eliezer about being specific , which you are not. Go define the equation first.
Also, it is not a question about narrow AI. I can right now write an ‘AI’ that would try to find self replicating glider gun that tiles entire game of life with something. And yes, that AI may run inside the machine in game of life. The issue is, that’s more like ‘evil terrorists using protein folder simulator AI connected to automated genome lab to make plague’, than ‘the AI maximizes paperclips’.
I’m bowing out of this discussion because it doesn’t seem to improve anyone’s understanding.
You handwave too much, and the people who already accept premise, they like the handwave that sounds vaguely theoretic. Those who do not, aren’t too impressed, and are only annoyed.
Or the people who understand the mathematics.
Cousin_it’s mathematics is correct, if counter-intuitive to those not used to thinking about quines. Whether it implies what he thinks it implies is a separate question as I discuss here.
Well, I assumed that he was building an AGI, and even agreed that it is entirely possible to rig the AI so that something the AI does inside a sim, gets replicated in the outside world. I even gave example: you make narrow AI that generates a virus mostly by simulated molecular interactions (and has some sim of the human immune system, people’s response to the world events, what WHO might do, and such) and wire it up to a virus making lab that can vent it’s produce into the air in the building or something edit: or best yet one that can mail samples to what ever addresses. That would be the AI that kills everyone. Including the AI itself in it’s sim would serve little functional role, and this AI won’t wirehead. It’s clear that the AGI risk is not about this.
edit: and to clarify, the problem with vague handwaving is that without defining what you handwave around, it is easy to produce stuff that is irrelevant, but appears relevant and math-y.
edit: hmm, seems that post with the virus making AI example didn’t get posted. Still, http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68cf and http://lesswrong.com/lw/bfj/evidence_for_the_orthogonality_thesis/68eo convey the point. I’ve never said it is literally impossible to make a narrow AI that is rigged to tile the game world with blocks. It is, clearly, possible. One could make a glider gun iterator that finds the self replicating glider gun in the simulator, then some simple mechanisms set to make that gun in the real world. That is not a case of AI wanting to do something to the real world. That’s a glorified case of ‘my thermostat doesn’t wirehead’, to borrow from Will_Newsome.
Other issue is that one could immediately define some specific goal like ‘number of live cells’, and we could discuss this more specifically, instead of vague handwave about ill defined goal. But I can’t just define things narrowly for the other side of an argument. The wireheading is a problem of systems that can improve themselves. A system that can e.g. decide that it can’t figure out how to maximize live cells but it can prove some good theorems about four blocks.
That’s a good point, but once we develop AIs that can cross the gap of understanding, how do you guarantee that no one asks their AI to convert the universe into paperclips, intentionally or not?
I find it really dubious that you could make an AI that would just do in the real world what ever you vaguely ask it to do.
(I’ve made all these arguments before on LessWrong and it doesn’t seem to have done anything. You’re being a lot more patient than I was, though, so perhaps you’ll have better luck.
By the way, The Polynomial is pretty awesome.)
Did anyone else have their first reaction as wanting to attack the starting premise?
Victim: But surely a smart artificial intelligence will be able to tell right from wrong, if we humans can do that?
Me: But we humans can’t even do that!
That would be correct in some sense, but wouldn’t accomplish the goal of explaining to the victim why superintelligences don’t necessarily share our morals.
Yes, that was my first reaction also, if only because it’s possible to attack that premise without reference to tricky AI mumbo-jumbo. It would be mildly clever but rather misleading to apply the reversal test: “You think a superintelligence will tend towards superbenevolence, but allegedly-benevolent humans are doing so little to create the aforementioned superintelligence;—humans apparently aren’t as benevolent as they seem, so why think a superhuman intelligence will be disanalogously benevolent? Contradiction, sucka!” This argument is of course fallacious because humans spend more on AGI development than do frogs—the great chain of being argument holds.
Then open the prisons.
Ha.
Looking back at my comment I can see why it might read like I’m a hardcore moral relativist. I don’t think I am — although I’ve never been sure of what meta-ethicists’ terms like “moral relativist” mean exactly — I just left qualifiers out of my original post to keep it punchy.
(I don’t believe, for example, that telling right from wrong is impossible, if we interpret “telling right from wrong” to mean “making a moral judgement that most humans agree with”. The claim behind my “But we humans can’t even do that!” is a weaker one: there are some moral questions with no consensus answer, or where there is a consensus but some people flout it. In situations like these people sometimes even accuse other people outright of not knowing right from wrong, or incredulously ask, “don’t you know right from wrong?” I see no necessary reason why the same issues wouldn’t crop up for other, smarter intelligences.)
Absence of consensus does not imply absence of objective truth
i don’t know about “necessary” but “they’re smarter” is possible and reasonably likely.
Correct, but that doesn’t bear on my claim. Moral disagreements exist, whether or not there’s objective moral truth.
It’s possible, but I don’t know any convincing arguments for why it’s likely, while I can think of plausibility arguments for why it’s unlikely.
If no-one is actually working on that kind of intelligence, one that’s highly efficient at arbitrary and rigid goals (an AOC)...then what’s the problem?
Or, more generally, using the word “intelligence” may be counterproductive. If we used something more like “the thing that happens to a computer when you upgrade its hardware, or in the course of going from a chess program that checks every option to a chess program that uses on-average-effective heuristics,” maybe people would go along on their own (well, if they were already interested in the topic enough to sit through that).
If your beef is about unintelligent, but super efficient machines, why communicate with the .AI community ? That’s generally not what they are trying to build.
Assuming the conclusion.
Assuming the conclusion.
(LessWrong, pretend I went through and tagged all comments in this thread that assume their conclusion with “Assuming the conclusion.”.)
Pretend I went through and downvoted all the tags. If they were anything like the grandparent they would be gross misapplications of the phrase. “Assuming the conclusions” just isn’t what cousin_it is doing in those two particular quotes.
That is making commentary on the conversation with implied criticism of the other’s perceived misuse of semantic quibbling. The ‘conclusion’ you would object to cousin_it assuming doesn’t even get involved there.
I don’t see how you can think that saying “humanity can someday find a way to build such a machine” isn’t assuming the conclusion. That’s the conclusion, and it’s being used as an argument.
“[Y]ou seem to have a concept of ‘intelligence’ that’s entangled with many accidental facts about humans” is the conclusion. Slepnev assumes it. Therefore, Slepnev assumes the conclusion. (It would be a restatement of the conclusion if his earlier arguments hadn’t also just been assuming the conclusion.) That the assumption of the conclusion is only implicit in the criticism doesn’t make it any less unjustified; in fact, it makes it more unjustified, because it has overtones of ‘the conclusion I have asserted is obviously correct, and you are stupid for not already having come to the same conclusion I have’.
Remember, I mostly agree with Slepnev’s conclusion, which is why I’m especially annoyed by non-arguments for it that are likely to just be turnoffs for many intelligent people and banners of cultish acceptance for many stupid people.
How good at playing chess would a chess computer have to be before it started trying to feed the hungry?
That’s up to the notion of ‘good’. If ‘good’ is defined to be ‘beats all humans alive right now’, then it might “feed” the hungry to be able to win chess matches against them.
Or kill everyone so they don’t produce more humans it has to beat.
The utility is in beating, not in non-playing, presumably. But yea, if its to beat all humans vs to beat most humans. edit: or it can set it’s ‘number of humans alive’ counter to zero directly without killing anyone.
A human with barely enough calories to survive is going to be a significantly weaker chess opponent.
How good at chess would it be before it started killing people or making paper clips ? The argument is about what an artificial general intelligence would do.
A possible intuition may come from computer games. In principle, a game world can be arbitrarily complex, and the goals (“winning conditions”) are specified by the programmers, so they can be anything at all. For example, the world may be Medieval Fantasy setting, while the goal may be to invent, craft, take from NPCs by force, and otherwise collect as many paperclips as possible. If an external incentive is provided for people to play the game (e.g., they get real-world money for each paperclip), the players will essentially become paperclip maximizers within the game world.
A related point. I don’t think the creators of The Sims, for example, anticipated that perhaps the primary use of their game would be as a sadistic torture simulator. The game explicitly rewards certain actions within the game, namely improving your game-character’s status. The game tells you what you should want. But due to our twisted psychology, we get more satisfaction locking our character in a room with no toilet, bed, food or water, with a blaring TV playing all day and night. And then killing him in a fire.
Totally normal people who are not otherwise sadists will play The Sims in this fashion. Playing “Kill Your Character Horribly” is just more fun than playing the game they intended you to play. You get more utility from sadism. An AI with unanticipated internal drives will act in ways that “don’t make sense.” It will want things we didn’t tell it to want.
Yes, this is a good point. I tried to minimize this effect (direct utility of fun playing the game in certain ways) by providing external incentives, which are assumed to be large enough to override the fun for people.
However, after more thinking, I’m not sure any external incentives would work in the important cases. After all, this belief structure—knowledge by the player of being in a game, and knowledge of getting outside utility from playing for specified arbitrary goals—appears to be able to override goals of any agent, including FAI. But if FAI is truly friendly, then it won’t play if the game’s quality is sufficiently advanced and the NPCs become real persons. Same is probably true for normal people. They wouldn’t torture ‘sufficiently real’ characters, not for any money.
Thus, the original idea of the grandfather comment is flawed...
You know torture and execution used to be major forms of entertainment, right?
Yeees… though I’m not sure how ‘normal’ it was, it could be mainly group effects.
I can only extrapolate from a single point, and “there isn’t anything I find even the tiniest bit tempting about nailing the skins of Yermy Wibble’s family to a newsroom wall”. Sadistically killing a ‘not real’ game character can be fun, but if I try to imagine doing the same to a ‘sufficiently real’ character, like uploaded person, then it… doesn’t work.
What does this distinction mean? A normal person in those groups would commit torture, and there’s no such thing as a ‘normal person’ completely abstracted from ‘group effects’; a Homo sapiens without memes isn’t really a person.
For large numbers of people to abhor torture as much as we do is a bizarre (from a historical POV) recent phenomenon, AFAIK.
Group effects (peer pressure, authority, etc) apparently can easily override personal values in humans’ corrupted hardware.
I am not sure you’re right about historical POV. I don’t think high primates deliberately torture each other for fun. I can be wrong, though...
So you’re claiming that there is a difference between “group effects” and “personal values”. I’m highly dubious.
In the interests of allowing you to extrapolate from two points, have an anecdote: Assuming ‘get away scot-free’ is included, I’d torture a thirty-year-old white male for a billion dollars. That’s my starting bid, you could probably bargain me down to less money or more objectionable kinds of people if you tried.
For altruistic utilitarian reasons?
Why do you ask?
Because I could see myself being persuaded in the altruistic case, but not in the selfish one.
Altruism: the best argument for torturing people.
This is why I’m always suspicious of altruism.
I defy the data.
Milgram experiment?
I’d say Khoth is obviously correct here.
People in Milgram experiment didn’t say they would torture people.
Knoth and TheOtherDave answered a different question. The data I defied is the truth of the first-person statement “I imagined myself torturing a person for N dollars, and according to my self-simulation, I would do it”.
??? Are you conceding that people do in fact torture one another, but denying that pedanterrific is one of those people?
If not, I completely fail to understand you.
If so, on what basis?
Agreed. They did, however, choose to inflict pain, in many cases intolerable pain, on other people.
That seems sufficient grounds for me to conclude that they chose to torture people.
Apparently irrational belief about people on LW.
(Just in case there’s any confusion: I have never tortured anyone, and don’t plan to.)
Oh, right. Yeah, oughtn’t have implied otherwise.
”… that some people do in fact torture one another, and therefore some people are willing to torture one another, but denying that pendanterrific...” is closer to what I actually wanted to mean.
Oh, okay. As a suggestion for the future: when you mean “I think you’re lying”, don’t say “I defy the data”. They don’t mean the same thing.
I didn’t think you were lying, I thought you were mistaken.
Could you explain one possible way someone could be mistaken about “I imagined myself torturing a person for N dollars, and according to my self-simulation, I would do it”?
Your original statement was much weaker, just “I’d torture… for a billion”. I thought you were mistaken because you didn’t actually imagine it. I clarified this later in the conversation.
Okay then.
Okay! Let’s get a third opinion in here. For those just joining us, the claim (paraphrased) is
Anyone want to chime in on this?
I’m not sure how much money it would take, but I think most normal people would do it for free if it was socially expected.
Sure, I’ll weigh in, since you’re asking.
History, including recent history, is full of people who tortured other people.
I see no reason to believe that defining all of those people as “not normal” is in the least bit justified; that seems more likely to be a No True Scotsman fallacy in action.
Adding a concrete incentive like money probably helps, if it’s a large enough sum, but honestly introducing money to the discussion seems to clutter the question unnecessarily. Normal people will torture one another for no money at all, under the right circumstances.
In fact due to the way taboo tradeoffs work, I suspect offering people money will make them less inclined to torture.
Yeah, that thought had crossed my mind as well; it’s certainly true for small-to-medium amounts of money.
Additional opinions wouldn’t help. I think your belief that you would torture people for a billion dollars is wrong.
This is an astonishing thing to say. You’re so confident I’m wrong that if someone else stated a similar opinion you would insist they were wrong too? What if that person had a verifiable job history as a professional torturer for some third-world dictator? There comes a point where the replications outweigh your ability to judge others’ knowledge of their own minds.
But we’re talking about ‘normal’ people, not professional torturers. I am fairly confident the LW community is torturers-free, DUST SPECKS controversy notwithstanding :)
My hypothesis is that you are making a logical declarative statement, not actually imagining the process.
But I guess if you insist that you are, and if many people agree with you, I will have to update...
What’s the distinction you’re making here? Do you think all professional torturers are evil mutants, or what?
I took this possibility seriously, so I just spent a minute imagining the process in as much detail as possible.
I’m willing to come down to a hundred million.
I don’t know. I never saw one in my life.
Hmm. Ok, another hypothesis: do you use an argument like “I’ll use the money I’ll get to improve the conditions of lots of other people, stop many other tortures going all over the world”, or something similar?
It didn’t work for me, but it may make sense for a stronger rationalist.
Be careful about confusing utility and pleasure. But your point about unexpected drives leading to unexpected results is absolutely true.
I like that idea. So, if we assume that all sufficiently smart AIs are “good”, then we can put such an AI in a simulated world in which the best way to acquire resources for its good deeds would be to play a game running on a computer provided by Dark Lords of the Matrix (that’s us!) and the goal of the game would be to pretend to be a “bad” AI. Except the game would really be an input/output channel into the real world. The whole system would effectively constitute a bad AI, thus contradicting the initial assumption.
However, anyone who seriously claims that sufficiently smart AIs will automatically be nice will also probably reject that argument by claiming that, well, a sufficiently smart AI would figure out that it is being tricked like that and would refuse to cooperate.
(Also: you could call it the “Ender’s Game” argument if you’re aiming for memorability more than respectability.)
Already got that idea—still a good one, though.
I think you won’t find a very good argument either way, because different ways of building AIs create different constraints on the possible motivations they could have, and we don’t know which methods are likely to succeed (or come first) at this point.
For example, uploads would be constrained to have motivations similar to existing humans (plus random drifts or corruptions of such). It seems impossible to create an upload who is motivated solely to fill the universe with paperclips. AIs created by genetic algorithms might be constrained to have certain motivations, which would probably differ from the set of possible motivations of AIs created by simulated biological evolution, etc.
The Orthogonality Thesis (or it’s denial) must assume that certain types of AI, e.g., those based on generic optimization algorithms that can accept a wide range of objective functions, are feasible (or not) to build, but I don’t think we can safely make such assumptions yet.
ETA: Just noticed Will Newsome’s comment, which makes similar points.
Wei Dai’s comment is full of wisdom. In particular:
But even if that is true, it is nowhere near enough to support an OT that can be plugged into an unfriendliness argument. The Unfriendliness argument requires that it is reasonably likely that researchers could create a paperclipper without meaning to. However, if paperclippers require an architecture—a possible architecture, but only one possible architecture—where goals and their implementation are decoupled, then both requirements are undermined. It is not clear that we can build such machines (“based on generic optimization algorithms that can accept a wide range of objective functions”) , hence a lack of likelihood; and it is also not clear that well intentioned people would.
Unfriendliness of the sort that MIRI worries about could be sidestepped by not adopting the architecture that supports orthogonality, and choosing one of a number of alternatives.
Exactly. The first AI we can create, certainly can’t have ‘nearly any type of motivation’.
There are several classes of AIs we can create; the uploads start off human; the human embryonic development sim (or other brain emulation that isn’t upload) is basically a child that learns and becomes human; that is to some extent true of most learning AI approaches; the neat AI that starts stupid can not start off with the goals that require highly accurate world-model (like the paperclip maximization) or the goals that lead to AI damaging itself, or the goals that prevent AI self improvement, as the first AI we create reasonably doesn’t start at grown-up educated Descartes level intelligence and invents the notion of self, and figures out that it must preserve itself to achieve the goals (and then figures out that it must keep the goals above the instrumental self preservation).
On top of this, as I commented on some other thread (forgot where) with the Greenpeace By Default example, if you generate random code, the simplest-behaving code dominates the space of code that doesn’t crash. This goes for the goal systems.
The orthogonality thesis, even if in some narrow sense true (or broad sense, for that matter), is entirely irrelevant; for example absolute orthogonality thesis would be entirely compatible with the hypothetical where out of the random goal space for the seed AI, and excluding the AIs that self destruct or fail to self improve, only one in 10^1000 is mankind destroying to any extent (simply because one or two simplest goal systems end up mankind-preserving because they were too simple to preserve just the AI).
What distinguishes the “Orthogonality thesis” from “Hume’s Guillotine”? If you’re looking for standard published arguments, I’d think you could start with “A Treatise of Human Nature” and proceed through the history of the “is-ought problem” from there.
Did you read the paper? It does cite Hume.
In fact, it spends most of it’s time arguing that Hume isn’t needed for the argument to work...
I did not yet; thank you for the link.
A lot of the arguments given in these comments amount to: We can imagine a narrow AI that somehow becomes a general intelligence without wireheading or goal distortion, or, We can imagine a specific AGI architecture that is amenable to having precisely defined goals, and because we can imagine them, they’re probably possible, and if they’re probably possible, then they’re probable. But such an argument is very weak. Our intuitions might be wrong, those AIs might not be the first to be developed, they might be theoretically possible but not pragmatically possible, and so on. Remember, we still don’t know what intelligence is! We can define it as cross-domain optimization or what have you, but such definitions are not automatically valid just because they look sorta math-y. AIXI is probably not intelligent in the sense that a human is intelligent, and thus won’t be dangerous. Why should I believe that any other AI architectures you come up with on the fly are any more dangerous?
So whenever you say, “imagine an AIXI approximation with a specific non-friendly utility function: that would be bad!”, my response is, “who says such an AGI is even possible, let alone probable?”. And whenever you say, “Omohundro says...”, my response is, “Omohundro’s arguments are informal and suggestive, but simply nowhere near conclusive, and in fact parts of his arguments can be taken to suggest in favor of an AI detecting and following moral law”. There just aren’t any knock-down arguments, because we don’t know what it takes to make AGI. The best you can do is to make pragmatic arguments that caution is a good idea because the stakes are high. When people in this community act as if they have knock-down arguments where there aren’t any it makes SingInst and LessWrong look like weirdly overconfident end-of-the-world-mongers.
(Also, the ‘AGI will literally kill us all by default’ argument is laughably bad, for many game theoretic and economic reasons both standard and acausal that should be obvious, and people unthinkingly repeating it also makes SingInst and LessWrong look like weirdly overconfident end-of-the-world-mongers.)
The argument in its simplest form is:
assume the AGI will have the capacity to kill us at very low cost/risk
humans make very inefficient use of resources (including what we need to stay alive)
most AGI goals are improved by more resources
most AGI goals are not human friendly
Hence most AGIs will make better use of resources by controlling them than by trading with humans. Hence most AGIs will kill us by default. You can question the assumptions (the last one is somewhat related to the orthogonality thesis), but the conclusion seem to come from them pretty directly.
1) The default case is that AGI will neither be malevolent nor benevolent but will simply have no appreciation of human values and therefore does not care to protect them.
2) An AGI is likely to become more powerful than humans at some point. Given #1, such a being poses a danger.
3) Given #1,2, we have to figure out how to make AGI that does protect humans and humane values.
4) Human moral value is very complex and it is therefore extremely difficult to approach #3, but worth trying given the associated risks.
yada-yada-yada
You know what’s your problem? You and other risks from AI advocates are only talking to people with the same mindset or people who already share most of your assumptions.
Stop that. Go and talk to actual AI researchers. Or talk to Timothy Gowers, Holden Karnofsky etc.
See what actual experts, world-class mathematicians or even neuroscientists have to say. I have done it. If you can convince them then your arguments are strong. Otherwise you might just be fooling yourself.
(upvoted because it didn’t deserve to be negative)
You’re making strong assumptions about what I am, and who I’ve talked to :-)
I’ve talked with actual AI researchers and neuroscientists (I’m a mathematician myself) - we’re even holding conference full of these kinds of people. If we have time to go through the arguments, generally they end up agreeing with my position (which is that intelligence explosions are likely dangerous, and not improbable enough that we shouldn’t look into them). The people who I have least been able to convince are the philosophers, in fact.
Given my epistemic state it was a reasonable guess that you haven’t talked to a lot of people that do not already fit the SI/LW memeplex.
Fascinating. This does not reflect my experience at all. Have those people that ended up agreeing with you published their thoughts on the topic yet? How many of them have stopped working on AI and instead started to assess the risks associated with it?
I’d also like to know what conference you are talking about, other than the Singularity Summit where most speakers either disagree or talk about vaguely related research and ideas or unrelated science fiction scenarios.
There is also a difference between:
Friendly AI advocate: Hi, I think machines might become very smart at some point and we should think about possible dangers before we build such machines.
AI researcher: I agree, it’s always good to be cautious.
and
Friendly AI advocate: This is crunch time! Very soon superhuman AI will destroy all human value. Please stop working on AI and give us all your money so we can build friendly AI and take over the universe before an unfriendly AI can do it and turn everything into paperclips after making itself superhumanly smart within a matter of hours!
AI researcher: Wow, you’re right! I haven’t thought about this at all. Here is all my money, please save the world ASAP!
I am not trying to ridicule anything here. But there is a huge difference between having Peter Norvig speak at your conference about technological change and having him agree with you about risks from AI.
What it generally was:
AI Researcher: “Fascinating! You should definitely look into this. Fortunately, my own research has no chance of producing a super intelligent AGI, so I’ll continue. Good luck son! The government should give you more money.”
In other words, those researchers estimate the value of friendly AI research as a charitable cause to be the share of their taxes that the government would assign to it if they would even consider it in the first place, which they believe the government should.
It’s hard to tell how seriously they really take risks from AI given those information.
It sounds like:
AI Researcher: Great story son, try your luck with the government. I am going to continue to work on practical AI in the meantime.
Indeed. I feel the absence of good counter-arguments was a more useful indication than their eventual agreement.
How much evidence, that you are right, does the absence of counter-arguments actually constitute?
If you are sufficiently vague, say “smarter than human intelligence is conceivable and might pose a danger”, it is only reasonable to anticipate counter-arguments from a handful of people like Roger Penrose.
If however you say that “1) it is likely that 2) we will create artificial general intelligence within this century that is 3) likely to undergo explosive recursive self-improvement, respectively become superhuman intelligent, 4) in a short enough time-frame to be uncontrollable, 5) to take over the universe in order to pursue its goals, 6) ignore 7) and thereby destroy all human values” and that “8) it is important to contribute money to save the world, 9) at this point in time, 10) by figuring out how to make such hypothetical AGI’s provably friendly and 11) that the Singularity Institute, respectively the Future of Humanity Institute, are the right organisations for this job”, then you can expect to hear counter-arguments.
If you weaken the odds of creating general intelligence to around 50-50, then virtually none have given decent counterarguments to 1)-7). The disconnect starts at 8)-11).
Quite strong evidence, at least for my position (which has somewhat wider error bars that SIAI’s). Most people who have thought about this at length tend to agree with me, and most arguments presented against it are laughably weak (hell, the best arguments against Whole Brain Emulations were presented by Anders Sandberg, an advocate of WBE).
I find the arguments in favour of the risk thesis compelling, and when they have the time to go through it, so do most other people with relevant expertise (I feel I should add, in the interest of fairness, that neuroscientists seemed to put much lower probabilities on AGI ever happening in the first place).
Of course the field is a bit odd, doesn’t have a wide breadth of researchers, and there’s a definite deformation professionelle. But that’s not enough to change my risk assessment anywhere near to “not risky enough to bother about”.
“risky enough to bother about” could be interpreted as:
(in ascending order of importance)
Someone should actively think about the issue in their spare time.
It wouldn’t be a waste of money if someone was paid to think about the issue.
It would be good to have a periodic conference to evaluate the issue and reassess the risk every 10 years.
There should be a study group whose sole purpose is to think about the issue.
All relevant researchers should be made aware of the issue.
Relevant researchers should be actively cautious and think about the issue.
There should be an academic task force that actively tries to tackle the issue.
It should be actively tried to raise money to finance an academic task force to solve the issue.
The general public should be made aware of the issue to gain public support.
The issue is of utmost importance. Everyone should consider to contribute money to a group trying to solve the issue.
Relevant researchers that continue to work in their field, irrespective of any warnings, are actively endangering humanity.
This is crunch time. This is crunch time for the entire human species. And it’s crunch time not just for us, it’s crunch time for the intergalactic civilization whose existence depends on us. Everyone should contribute all but their minimal living expenses in support of the issue.
Could you elaborate on the “relevant expertise” that is necessary to agree with you?
Further, why do you think does everyone I asked about the issue either disagree or continue to ignore the issue and work on AI? Even those who are likely aware of all the relevant arguments. And what do you think which arguments the others are missing that would likely make them change their mind about the issue?
Because people always do this with large scale existential risks, especially ones that sound fringe. Why were there so few papers published on Nuclear Winter? What proportion of money was set aside for tracking near-earth objects as opposed to, say, extra police to handle murder investigations? Why is the World Health Organisations’s budget 0.006% of world GDP (with the CDC only twice as large)? Why are the safety requirements playing catch-up with the dramatic progress in synthetic biology?
As a species, we suck at prevention, and we suck especially at preventing things that have never happened before, and we suck especially especially at preventing things that don’t come from a clear enemy.
I have my doubts that if I would have written the relevant researchers about nuclear winter they would have told me that it is a fringe issue. Probably a lot would have told me that they can’t write about it in the midst of the cold war.
I also have my doubts that biologists would tell me that they think that the issue of risks from synthetic biology is just bunkers. Although quite a few would probably tell me that the risks are exaggerated.
Regarding the murder vs. asteroid funding. I am not sure that it was very irrational, in retrospect, to avoid asteroid funding until now. The additional amount of resources it would have taken to scan for asteroids a few decades ago versus now might outweigh the few decades in which nobody looked for possible asteroids on a collision course with earth. But I don’t have any data to back this up.
Oh yes, and I forgot one common answer, which generally means I need pay no more attention to their arguments, and can shift into pure convincing mode: “Since the risks are uncertain, we don’t need to worry.”
Well said. Or, at least a good start.
Oh. Was the earlier part supposed to be satire?
No. I actually pretty much agree with it. My whole point is that to reduce risks from AI you have to convince people who do not already share most of your beliefs. I wanted to make it abundantly clear that people who want to hone their arguments shouldn’t do so by asking people if they agree with them who are closely associated with the SI/LW memeplex. They have to hone their arguments by talking to people who actually disagree and figure out at what point their arguments fail.
See, it is very simple. If you are saying that all AI researchers and computer scientists agree with you, then risks from AI are pretty much solved insofar that everyone who could possible build an AGI is already aware of the risks and probably takes precautions (which is not enough of course, but that isn’t the point).
I am saying that you might be fooling yourself if you say, “I’ve been to the Singularity Summit and talked to a lot of smart people at LW meetups and everyone agreed with me on risks from AI, nobody had any counter-arguments”. Wow, no shit? I mean, what do you anticipate if you visit a tea party meeting arguing how Obama is doing a bad job?
I believe that I have a pretty good idea on what arguments would be perceived to be weak or poorly argued since I am talking to a lot of people that disagree with SI/LW on some important points. And if I tell you that your arguments are weak then that doesn’t mean that I disagree or that you are all idiots. It just means that you’ve to hone your arguments if you want to convince others.
But maybe you believe that there are no important people left who it would be worthwhile to have on your side. Then of course what I am saying is unnecessary. But I doubt that this is the case. And even if it is the case, honing your arguments might come in handy once you are forced to talk to politicians or other people with a large inferential distance.
What does “most” AGI s mean? Most we are likely to build? When our only model of AGI is human intelligence ?
There is no engineering process corresponding to a random dip into mind space.
This assumption comes at a high cost in probability mass. The difficulty of “killing off humanity” type tasks will increase exponentially as AI leads to AGI leads to super-AGI; its a moving target.
Largely irrelevant: humans use an infinitesimal fraction of solar resources. Moreover, (bacteria, insects, rats) make very inefficient use of our resources as well, why haven’t we killed them off?
The bronze age did not end for lack of steam, nor the coal age for lack of coal. Evolution appears to move forward by using less resources rather than more.
Who cares? Most AGI goals will never be realized.
True, the question is: what resources?
Most random home brain surgical operations will kill us by default as well.
The first assumption is the one that has many solid arguments against it; the assumption might be wrong or it might be right, but when people confidently give the conclusion without acknowledging that the first assumption is quite a big one, they make SingInst/LessWrong look overconfident or even deliberately alarmist.
Then I agree with you (though this has little to do with “game theoretic and economic reasons both standard and acausal”).
you’re either greatly overestimating your audience (present company included) or talking to a reference class of size 10.
Completely agree with your comment. Conceivability does not imply conceptuality, does not imply logical possibility, does not imply physical possibility, does not imply economic feasibility. Yet the arguments uttered on Less Wrong seldom go beyond conceivability.
This is exactly the impression I got when I first started asking about risks from AI. Most of all comments I got have been incredible poor and without any substance. But the commentators do not notice that themselves because other people on lesswrong seemingly agree and they get upvoted. Yet nobody with the slightest doubts would be convinced.
All they manage to do is convince those who already hold the same set of beliefs or who fit a certain mindset.
I just reread this post yesterday and found it to be a very convincing counter-argument against the idea that we should solely act on high stakes.
It’s perhaps worth noting that this observation is true of most discussion about most even-mildly-controversial subjects on LessWrong—quantum mechanics, cryonics, heuristics and biases, ethics, meta-ethics, theology, epistemology, group selection, hard takeoff, Friendliness, et cetera. What confuses me is that LessWrong continues to attract really impressive people anyway; it seems to be the internet’s biggest/best forum for interesting technical discussion about epistemology, Schellingian game theory, the singularity, &c., even though most of the discussion is just annoying echoes. One of a hundred or so regular commenters is actually trying or is a real intellectual, not a fountain of cultish sloganeering and cheering. Others are weird hybrids of cheerleader and actually trying / real intellectual (like me, though I try to cheer on a higher level, and about more important things). Unfortunately I don’t know of any way to raise the “sanity waterline”, if such a concept makes sense, and I suspect that the new Center for Modern Rationality is going to make things worse, not better. I hope I’m wrong. …I feel like there’s something that could be done, but I have no idea what it is.
Eh, I think Vassar’s reply is more to the point.
I think Wei_Dai’s reply does trump that.
What Vassar is saying sounds to me like a justification of Pascal’s Wager by arguing that some God’s have more measure than others and that therefore we can rationally decide to believe into a certain God and live accordingly.
That is like saying that a biased coin does not have a probability of 1⁄2 and that we can therefore maximize our payoff by betting on the side of the coin that is more likely to end up face-up. Which would be true if we had any other information other than that the coin is biased. But if we don’t have any reliable information except other than that it is biased, it makes no sense to deviate from the probability of a fair coin.
And I don’t think it is clear, at this point, that we are justified to assume more than that there might be risks from AI. Claiming that there are actions that we can take, with respect to risks from AI, that are superior to others, is like claiming that the coin is biased while being unable to determine the direction of the bias. By claiming that doing something is better than doing nothing we might as well end up making things worse. Just like by unconditionally assigning a higher probability to one side of a coin, of which we know nothing but that it is biased, in a coin tossing tournament.
The only sensible option seems to be to wait for more information.
This is one of The Big Three Problems I came to LW hoping to find a solution for, but have mainly noticed that nobody wants to talk about it. Oh well.
Now I am curious about the other two.
How do you judge what you should (value-judgmentally) value?
How do you deal with uncertainty about the future (unpredictable chains of causality)? (what your above post was about)
What’s the right thing to do in life?
Here are some of my previous posts on the topics.
Your posts highlight fundamental problems that I have as well. Especially this and this comment concisely describe the issues.
I have no answers and I don’t know how other people deal with it. Personally I forget about those problems frequently and act as if I can actually calculate what to do. Other times I just do what I want based on naive introspection.
This is a problem—though it probably shouldn’t stop us from trying.
Players can try to improve their positions and attempt to gain knowledge and power. That itself might cause problems—but it seems likely to beat thumb twiddling.
Why do you think that “Center for Modern Rationality” is going to make things worse? Let’s hope it will not hinge on Eliezer Yudkowsky’s more controversial deliberations (as for me, his thoughts on: the complexity of ethical value, the nature of personhood, the solution to FAI).
I don’t think what they teach will be particularly harmful to people’s epistemic habits, but I don’t think it’ll be helpful either, and I think that there will be large selection effects for people who will, through sheer osmosis and association with the existent rationalist community, decide that it is “rational” to donate a lot of money to the Singularity Institute or work on decision theory. It seems that the Center for Modern Rationality aims to create a whole bunch of people at roughly the average LessWrong commenter level of prudence. LessWrong is pretty good relatively speaking, but I don’t think their standards are nearly high enough to tackle serious problems in moral philosophy and so on that it might be necessary to solve in order to have any good basis for one’s actions. I am disturbed by the prospect of an increasingly large cadre of people who are very gung-ho about “getting things done” despite not having a deep understanding of why those things might or might not be good things to do.
Why is that confusing? Have you looked at the rest of the internet recently?
Not really. But are you saying that nowhere else on the internet is close to LessWrong’s standards of discourse? I’d figured that but part of me keeps saying “there’s no way that can be true” for some reason.
I’m not sure why I’m confused, but I think there’s a place where my model (of how many cool people there are and how willing they would be to participate on a site like LessWrong) is off by an order of magnitude or so.
A better question is how many of them are willing to create a site like LessWrong.
Also minor nitpick about your use of the word ‘cool’, since it normally denotes social status rather than rationality.
It might be true when it comes to cross-domain rationality (with a few outliers like social abilities). But it certainly isn’t true that Less Wrong is anywhere close to the edge in most fields (with a few outliers like decision theory).
It isn’t a definitive argument, but you could point out that various intelligent historical figures had different morals from modern intelligent people. Napoleon, for instance—his intelligence is apparent, but his morality is ambiguous. Newton, or Archimedes, or George Washington, or any of several others, would work similarly.
Thanks, that’s one of the approaches I was planning to use—but also use very pathological high-functioning individuals, and imagine their speed being boosted...
A problem with this argument is that it’s using that slippery word, “intelligence”; one could argue that Jesus was the most intelligent person ever because he exerted the most optimization pressure, and his values (or our modern conceptions of them) just so happen to line up really well with the professed values of modern intelligent people. Same with Rousseau. Also, Archimedes barely knew any calculus—clearly he wasn’t very intelligent.
I would like the argument to work, because at least it’s relatively empirical, but it seems too easy to poke holes in.
Eh, Jesus of Nazareth didn’t exert very much optimization pressure; Paul of Tarsus did most of the work.
Sidestepping the particulars of early Christianity: in a case where agent A articulates a set of values and agent B subsequently implements state changes in the world that align it with those values, my judgment about who exerted what optimization pressure seems to depend a lot on what I think B would have done in the absence of A’s output.
If I think B would have done exactly the same thing in that case, then I conclude that A exerted no optimization pressure. If I think B would have done something utterly different, then I conclude A exerted a great deal of optimization pressure (and did so extremely efficiently). Etc.
There are also the people who claim that the “Jesus” that Paul talked about never actually existed as a specific individual.
Yup; that is another one of the particulars of early Christianity I’m sidestepping here.
Accept that moral conceptual truths are possible, and instead argue that an AI would deliberately try to not learn them.
“I want to make papperclips, if I think a bit about morality I will realize the error of my ways and cease wanting to make paperclips, thus less pappercips will be made, hence I must not think about morality.”
Classic Murder Gandhi.
I don’t know what thinking about X will do to me. So either I never attempt to self improve, or I take a chance.
Well, yes, if said moral truth is obvious in something an AI to young to your to realize the danger is likely to stumble upon by mistake. This doesn’t seem all that likely, and if the AI isn’t inescapably snared by the time it gets to realizing as much as we already have, then it can still sandbox any process that’s exploring new arguments in any way relating to morality and shut them down if they show any sign of shifting values.
We know our level of intelligence is reachable while still disagreeing about morality, and especially after having read these posts it could easily implement failsafes like that at even lower levels.
I suppose you could build an AI that had both drives to self improve, and an extreme caution about accidentally changing its other values (although evolution doesn’t seem to have built us that way). That gives you the welcome conclusion that the AI in question is potentially unfriendly, rather than the disturbing one that it is potentially self-correcting. But we already knew you could build unfriendly AIs if you want to: the question is whether the friendly or neutral AI you think you are building will turn on you, whether you can achive unfriendliness without carefully designing it in.
If you can build an AI like that even in theory, then the “universal morality” isn’t universal, just a very powerful attractor. A very powerful attractor might indeed be a thing that exist.
Evolution does very much seem to have built us this way, just very incompetently. At the very least, I know for a fact me and the majority of other buying strongly into the lesswrong memeplex in the first place has this kind of self preserving value system.
If there is such an universal morality, or strong attractor, it’s almost certainly something mathematically simple and in no way related to the complex fragile values humans have evolved. To us, it’d not seem moral at all, but horrifying and either completely incomprehensible, or converting us to it through something like nihilism or existential horror or pascal’s wager style exploits of decision theory, not appealing to human specific things like compassion or fun. After all, it has to work through the kind of features present in all sufficiently intelligent agents.
For an example of what a morality that is in some sense universal looks like, look to the horror called evolution.
Thus, any AI that is not constructed in this paranoid way, is catastrophically UN friendly, on a much deeper level than any solution yet discovered. For example, some might argue that an universal morality forming AI is friendly because it’s what coherent extrapolated volition would chose, but this only show that if an universal morality is even possible then the idea of coherent extrapolated volition is broken as well.
“Objective morality”, if there is such a thing, is nothing more or less than the mother of all basilisks.
Objective moral truth is only universal to a certain category of agents. it doesn’t apply to sticks and stones, and it isn’t discoverable by crazy people, or people below a certain level of intelligence. If it isn’t discoverable to a typical LW-style AI, with an orthogonal architecture, unupdateable goals, and purely instrumental rationality (I’m tempted to call them Artificial Obsessive Compulsives), then so much the worse for them. That would be a further reason for filiing a paperclipper under “crazy person” rather than “rational agent”.
You are portraying morality as something arbitrary and without rational basis that we are nonetheless compelled to believe in. That is more of a habit of thought than an argument.
Who says it is in no way related? SImple unviversal principles can can pan out to complex and localised values when they are applied to complex and localised situations. The simple universal laws of physics don;t mean evey physical thing is simple.
Lots of conclusions, but not many arguments there. The only plasuible point is that an abstract universal morality would be dry and unappealing. But that isn’t much of a case against moral objectivism, which only requires moral claims to be true.
I don’t think “evolution” and “morality” are synonyms. In fact, I don’t; see much connection at all.
Interesting use of “thus”, there,.
I wasn’t trying to argue, just explain what appears to be the general consensus stance around here.
You seem to be using a lot of definitions differently than everyone else and this leads to misunderstandings. If I’ve got them right, perhaps the truth you are missing is this phrased using your definitions this: “Nothing other than an FAI has any morality. All intelligences, in all the multiverse, that are not deliberately made by humans to be otherwise, are crazy, in such a way it’ll remain so no matter how intelligent and powerful it gets.”
Any nonhuman AI is much closer to the category of evolution, or a stone, than it is to a human.
I’m not very concerned about consensus views unless they are supported by good arguments.
I believe that I am using definitions that are standard for the world at large, if not to LW.
Does “nothing other than an AI” include humans?
(Note that I do not necessarily agree with what I wrote below. You asked for possible counter-arguments. So here goes.)
Might intelligence imply benevolence?
I believe that a fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself to the point where it is substantially better at most activities than humans. Since if it wouldn’t want to be maximally correct then it wouldn’t become superhuman intelligent in the first place.
If we consider giving such an AGI a simple goal, e.g. the established goal of paperclip maximization. Is it really clear that human values are not implicit even given such a simplistic goal?
To pose an existential risk in the first place, an AGI would have to maximize paperclips in an unbounded way, eventually taking over the whole universe and convert all matter into paperclips. Given that no sane human would explicitly define such a goal, an AGI with the goal of maximizing paperclips would have to infer it as implicit to do so. But would such an inference make sense, given its superhuman intelligence?
The question boils down to how an AGI would interpret any vagueness present in its goal architecture and how it would deal with the implied invisible.
Given that any rational agent, especially AGI’s capable of recursive self-improvement, want to act in the most intelligent and correct way possible, it seems reasonable that it would interpret any vagueness in a way that most closely reflects the most probable way it was meant to be interpreted.
Would it be intelligent and correct to ignore human volition in the context of maximizing paperclips? Would it be less wrong to maximize paperclips in the most literal sense possible?
The argument uttered by advocates of friendly AI is that any AGI that isn’t explicitly designed to be friendly won’t be friendly. But I wonder how much sense this actually makes.
Every human craftsman who enters into an agreement is bound by a contract that includes a lot of implied conditions. Humans use their intelligence to fill the gaps. For example, if a human craftsman is told to decorate a house, they are not going to attempt to take over the neighbourhood to protect their work.
A human craftsman wouldn’t do that, not because they share human values, but simply because it wouldn’t be sensible to do so given the implicit frame of reference of their contract. The contract implicitly includes the volition of the person that told them to decorate their house. They might not even like the way they are supposed to do it. It would simply be stupid to do it any different way.
How would a superhuman AI not contemplate its own drives and interpret them given the right frame of reference, i.e. human volition? Why would a superhuman general intelligence misunderstand what is meant by “maximize paperclips”, while any human intelligence will be better able to infer the correct interpretation?
You are assuming that the AI needs something from us, which may not be true as it develops further. The decorator follows the implied wishes not because he is smart enough to know what they are, but because he wishes to act in his client’s interest to gain payment, reputation, etc. Or he may believe that fulfilling his client’s wishes are morally good according to his morality. The mere fact that the wishes of his client are known does not guarantee that he will carry them out unless he values the client in some way to begin with (for their money or maybe their happiness)
And an AGI wishes to achieve its goals the way they are meant to be achieved. Which includes all implicit conditions.
An AGI does not have to explicitly care about humans and their values as long as the implied context of its goals is human volition.
Consider a rich but sociopathic human decorator who solely cares about being a good decorator. What does a good decorator do? It does what its contract explicitly tells him to do AND what is implied by it, including the satisfaction of the customer.
You don’t need human moral values or any other complex values as long as you care to achieve your goals the way they are meant to be achieved, explicitly and implicitly.
If an AI has human interests as its main goal, it is already friendly. The question was whether intelligence on its own is enough to align it with human interests, which seems very unlikely. If the AI actually has cooperation with humans or fulfillment of some human wish as its goal, it will be able to use intelligence to better fulfill the wishes with all available context. But it’s getting the AI to operate with that goal that is difficult, I believe.
I think a better analogy with an AI would be a sociopathic decorator that doesn’t care about being a good decorator, but does care about fulfulling contracts, and cares about nothing not stated in the contract.
The “I obeyed the explicit content of the contract but didn’t give you what you want, sucks to be you” attitude exists in some humans (who are intelligent enough to know the implied meaning of the contract), so why wouldn’t it also exist in AIs?
Sure, but why would anyone likely build such an AI? Which is at the core of what Ben Goertzel argues, we do not pull minds from design space at random.
A tool does what it is supposed to do. If you add a lot of intelligence, why would it suddenly do something completely nuts like taking over the universe, something that was obviously not the intended purpose?
I don’t think it would make sense to create an AGI that does not care about the implications and context of its goals but only follows the definitions verbatim. That doesn’t seem to be very intelligent behavior. And that’s exactly a quality an AGI capable of self-improvement needs, a sense for context and implications.
Many of our tools are supposed to be web browsers, email clients, etc., but have a history of suddenly doing something completely nuts like taking over the whole computer, which was obviously not the intended purpose. Programming is hard that way—the result will only follow your program, verbatim. Attempts to give programs a greater sense of context and implications aren’t new—they’re called “higher level languages”. They feel less like hand-holding a dumb machine and more like describing a thought process, and you can even design the language to make whole classes of lower-level bugs unwriteable, but machines still end up doing what they’re instructed, verbatim (where “what they’re instructed” can now also include the output of compiler bugs).
The trouble is that you can’t rule out every class of bugs. It’s hard (impossible?) to distinguish a priori between what might be a bug and what might just be a different programmers’ intention, even though we’ve been wishing for the ability to do so for over a century. “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?”
Thank you. I’ve been trying to argue that “the computer does what you tell it to” is a much more chaotic situation than those who want to build FAI seem to believe, and you lay it out better than I have.
Yet, people around here seem to believe that the AI will develop an accurate model of the world even if its input isn’t all that accurate.
Who believes what, exactly?
Because computer programs do what they’re programmed to do, without taking into account the actual intention of the user.
Creating an AGI that does take into account what people really want (bearing in mind that the AGI is massively more intelligent than the people wanting the things) is, it seems to me, what the whole Friendly thing is about. If you know how to do that, you’ve solved Friendliness.
Edit: With added complications such as people not knowing what they want, people having conflicting goals, people wanting different things once there’s a powerful AI doing stuff, etc etc
You are assuming that an .AI will last have only instrumental rationality. That the OT is true.
The standard counterargument is along the lines of: it won’t care about getting things right per se, it will only employ rationality as a means to other goals.(Or: instrumental rationality is the only kind, because that’s how we define rationality).
What justifies the “will”, the claim of necessity, or at least high probability, brings us back to the title of the original posting: evidence for the Orthogonality Thesis. Is non-instrumental rationality, rationality-as-a-goal impossible? Is no-one trying to build it? Why try to build single minded Artificial Obsessive Compulsives if it is dangerous? Isn’t rationality-as-a-goal a safer architecture?
How is this a curiosity stopper? It’s a good question, as is evidenced by your trying to find an answer to it.
“What should we have the AI’s goals be?”
“Eh, just make it self-improve, once it’s smart it can figure out the right goals.”
How’s that any different from:
“What should we have the AI’s beliefs be?”
“Eh, just make it self-improve, once it’s smart it can figure out the true beliefs.”
It’s not very different. They are both different from:
“The AI will acquire accurate beliefs by using a well understood epistemology to process its observations, as it is explicitly designed to do.”
“Smart” implicitly entails “knows the true beliefs”, whereas it doesn’t entail “has the right goals”.
It doesn’t exclude having the right goals, either. You could engineer something whose self-improvement was restricted from affecting its goals. But if that is dangerous, why would you?
Well, the difference is that building an AI without figuring out where goals come from gives you a dangerous AI, while building an AI without figuring out where beliefs come from gives you a highly-optimized compiler that wants to save humanity.
factual beliefs != moral beliefs
And the methods for investigating them are very different.
It’s a curiosity stopper in the sense that people don’t worry any more about risks from AI when they assume that intelligence correlates with doing the right thing, and that superintelligence would do the right thing all the time.
Stuart is trying to answer a different question, which is “Given that we think that’s probably false, what are some good examples that help people to see its falsity?”
As an example of a fairly powerful optimization process with very unhuman goals, you can cite evolution, which is superhuman in some ways, yet quite amoral.
And yet this same amoral unhuman optimization process produced humans and morals, so what is the lesson there?
The lesson is “be sure to quine your goal system properly”. We will be the death of evolution, let’s not let the same happen to us.
http://philosophicaldisquisitions.blogspot.com/2012/04/bostrom-on-superintelligence-and.html
Possibly somewhat off-topic: my hunch is that the actual motivation of the initial AGI will be random, rather than orthogonal to anything.
Consider this: how often has a difficult task been accomplished right the first time, even with all the careful preparation beforehand? For example, how many rockets blew up, killing people in the process, before the first successful lift-off? People were careless but lucky with the first nuclear reactor, though note “Fermi had convinced Arthur Compton that his calculations were reliable enough to rule out a runaway chain reaction or an explosion, but, as the official historians of the Atomic Energy Commission later noted, the “gamble” remained in conducting “a possibly catastrophic experiment in one of the most densely populated areas of the nation!”
I doubt that one can count on luck in the AGI development, but I would bet on unintentional carelessness (and other manifestations of the Murphy’s law).
The bottom line is (nothing new here), no matter how much you research things beforehand, the first AGI will have bugs, with unpredictable consequences for its actual motivation. If we are lucky, there will be a chance to fix the bugs. Whether it is even possible to constrain the severity of bugs is way too early to tell, given how little is currently known about the topic.
Assuming, from the title, that you’re looking for argument by counterexample...
The obvious reply would be to invoke Godwin’s Law—there’s a quote in Mein Kampf along the lines of “I am convinced that by fighting off the Jews, I am doing the work of our creator...”. Comments like this pretty reliably generate a response something like “Hitler was a diseased mind/insane/evil!” to which you may reply “Yeah, but he was pretty sharp, too.” However, this has the downside of invoking Nazis, which in a certain kind of person may provoke an instant “This is a reactionary idiot” response and a complete discarding of the argument. So it’s a temperamental trick, and I’m not skilled enough in the dark arts to know if it’s a net gain.
On the other hand, you might prefer Pol Pat, or Ted Bundy, or any of a very large number of dictators and serial killers who don’t produce the same mindkilling response as Hitler.
A lot of fictional evidence comes to mind as well, but we do try not to generalize from that… Still, if you just want to WIN the argument rather than win rationally, it may help to pull an example from some media form that the audience is likely to appreciate. Lex Luthor, Snidely Whiplash, Yagami Light (or L, if you prefer), Mephistopheles (or Faust), and so on.
Is that the sort of thing you wanted?
Hitler also had a lot of false beliefs about Jews.
Eh… I can’t think of any object-level fact that would have convinced Hitler that the Judeo-Christian memeplex, most clearly manifested in the Jews, wasn’t actually a serious threat to the (prospective) virility & Spartan glory of Germany.
When I looked at the puppy, I realized this:
At the moment when you create the AIs, their motivation and intelligence could be independent. But if let them run for a while, some motivations will lead to changes in intelligence. Improving intelligence could be difficult, but I think it is obvious that motivation to self-destruct will on average decrease the intelligence.
So are you talking about orthogonality of motivation and intelligence in freshly created AIs, or in running AIs?
I think he’s looking for refutations of the statement “Improving intelligence will necessarily always change motivation to the same set of goals, regardless of the starting goal set.”
What I’d be really looking for is: “intelligence puts some constraints on motivation, but it can still vary in all sorts of directions, far beyond what we humans usually imagine”.
There’s also a big effect of motivation on intelligence even outside the small part of the possibilities that think the world exactly as it is, but without them in it, is optimal.
This is because some goals don’t require much intelligence (by the standards of self-improving AIs, that is—we’d think it was a lot) to implement, while other goals do.
EDIT: of course, what we’re examining in the op is causal relations the other way, intelligence-> goals.
Sure, utility and intelligence might be orthogonal. But very different utilities could still lead to very similar behaviors.
I suspect a self-modifying AI that’s cobbled together enough to be willing to mess with its goals will tend towards certain goals. Specifically, I think it would be likely to end up trying to maximize some combination of happiness (probably just its own), knowledge, power, and several other things.
I’d still consider this an argument to work on FAI. Motivation and intelligence don’t have to be orthogonal; they just have to not be parallel.
Exactly. The orthogonality thesis is far stronger than what is needed. And that’s important, because orthogonality looks quite simply false. Intelligence is fostered by specific motivations: curiosity, truth-seeking, a search for simple and elegant explanations, and so on. Of course you could redefine “motivation” so that these “don’t count”, and make orthogonality a tautology, but that doesn’t seem productive.
In Tim Tyler’s reply he quotes someone, I know not who, saying
But the “other words” could be interpreted to state a new thesis. It is a weaker and more general hypothesis that is actually relevant to FAI. If we read “any final goal” as indicating perhaps one of many goals, then an intelligent agent can have multiple final goals. And although the goals that are partly constitutive of intelligence must be among its goals, it can combine these with any others. Furthermore, the intelligence-related goals need not even be final (“terminal value”) goals.
-
Sure they can, but will they?
The weaker “in-theory” orthogonality thesis is probably true, almost trivially, but it doesn’t matter much.
We don’t care about all possible minds or all possible utility functions for the same reason we don’t care about all possible programs. What’s actually important is the tiny narrow subset of superintelligences and utility functions that are actually likely to be built and exist in the future.
And in this light it is clear that there will be some correlation between the population distributions over intelligences and utility functions/motivations, and the strongest form of the orthogonality thesis trivially fails.
Intelligence in humans evolved necessarily in the context of language and the formation of social meta-organisms, and we thus have many specific features such as altruistic punishment (moral justice), empathy, and so on that are critical to the meta-organism.
AGI systems will likewise develop from this foundation and evolve in our economy. This environment will select for AGI systems that either fulfill our needs or are like us (or both). The rest will be culled.
Dancy (Real values in a Humean Context, p180) argues that Naturalism provides grounds independant of Humeanism to suspect that moral beliefs need not necessarily motivate.
If I was a strong moral realist, I’d also believe that an AI should be able to just “figure it out”. I wonder instead if exposure to the field of AI reserach, where cost functions and methods of solution are pretty orthogonal would help alleviate the moral realism?
All these arguments for the danger of AGI are worthless if the team that creates it doesn’t heed the warnings.
I knew about this site for years, but only recently noticed that it has “discussion” (this was before the front page redesign), and that the dangers of AGI are even on-topic here.
Not that I’m about to create an AGI: The team that is will probably be even busier and less willing to be talked down to as in “you need to learn to think”, etc.
Just my 2e-2
The argument I tend to default to is, “if there were definitively no fundamental moral values, how would we expect the universe we observe to be different?” If we can’t point to any way that moral objectivity constrains our expectations, then it becomes another invisible dragon.
If there were no mathematical truths, would the observable universe be different?
If every intelligent entity just passively recorded facts, that would be valid. But agents act, and morality is about acting rightly.
If I’m understanding the question correctly, then probably. Assuming for the sake of an argument that there could be an observable universe at all without mathematical truths, then I’d say we should at least expect things like the same numbers to add up to different sums in different contexts, circles having variable ratios of circumference to radius, etc. The consequences would probably be much more dramatic than we can even imagine if we’re bound by our familiarity to our own universe.
This doesn’t imply that any sort of moral objectivity need exist. Agents act, and judge, but they do not all act or judge the same way, and there need not be any objective standard according to which one can determine which agents act rightly.
Taste is about assigning aesthetic preference, and agents assign aesthetic preferences (at least some of them do.) Does that imply that taste must be objective?
You think matheMatical truth is causal, SOMEHOW?
It wasn’t meant to: it was an argument against an argument against a claim, not an argument for a counter-claim. You were arguing that moral truths do not have the epistemology that would be expected of empirical truths: but they are not empirical truths.
For our universe to run on mathematical laws, there have to be some.
Not every mathematical truth need apply directly to the real world, but if none of them did, then we’d have rather less reason to suspect that they were actually truths.
Can you give any examples of things we would mutually recognize as truths for which we cannot observe evidence? Math, as we have already covered, I do not acknowledge as an example, and I don’t think most other regulars here would either.
Laws may be causal. I was asking about truths.
The vast majority of them do not apply to the real world. For every inverse square law that applies, there is an inverse cube law (etc) that does not. However, that is physics. (Pure) mathematicians aren’t concerned about that.
OTOH, I have never seen a mathematical proof that used observation of experiment.
We don’t use empirical verification for individual proofs, but the edifice of mathematics as a whole is subject to evidence with respect to whether or not it works.
In the way that scientific theories are, ie that if there is one area of mismatch between the theory and the evidence, the whole theory is disregarded?But almost all of maths is empirically “wrong”.
Most math does not attempt to describe real phenomena, and so is not empirically wrong but empirically irrelevant.
Suppose we lived in a universe where the sum of two and two wasn’t any number in particular. You couldn’t predict in advance how many objects you would have if you had two collections of two objects and added them together, or if you divided or multiplied a collection of objects, etcetera. We have no system for manipulating numbers or abstract symbols in a coherent and concrete way, and the universe doesn’t appear to operate on one.
Then, one day, a person declares, “I’ve developed a set of axioms which allow me to manipulate ‘numbers’ in a coherent and self consistent way!” They’ve invented a set of rules and assumptions under which they can perform the same operations on the same numbers and get the same results… in theory. But in practice, they can’t predict the real result of adding two peanuts together any better than anyone else can.
In such a universe, you’d have considerable evidence to suspect that they had not stumbled upon some fundamental truths which simply don’t happen to apply to your universe, but that the whole idea of “math” was nonsense in the first place.
This doesn’t make much sense as stated. Math is a collection of tools for making useful maps of a territory (in the local parlance). The concept of numbers is one such tool. Numbers are not physical objects, they are a part of the model. You cannot add numbers in the physical universe, you can only manipulate physical objects in it. One way to rephrase your statement is “Suppose we lived in a universe where when you combine two peanuts with another two peanuts, you don’t get four peanuts”. This is how it works for many physical objects in our universe, as well: if you combine two blobs of ink, you get one blob of ink, if you combine one male rabbit and one female rabbit, the number of rabbits grows in time. If the universe you describe is somewhat predictable, it has some quantifiable laws, and the abstraction of these laws will be called “math” in that universe.
The intended meaning of that sentence was that adding two of one thing to two of another thing does not give consistent results, regardless of the things you’re adding. Adding two peanuts to two peanuts does not consistently result in any particular quantity of peanuts, and the same is true of any other objects you might attempt to add together.
For the sake of an argument, we shall suppose that it’s not. It’s nigh-impossible to even make sense of the hypothetical as proposed, but then, if there were alternate realities where math could exist or not exist, they would probably be mutually nonsensical.
Isn’t whether numbers are part of the territory or part of the map a debatable topic?
The universe is more than just a collection of physical objects. There are properties of objects, their relationships, their dynamics...
It is indeed. If you are a Platonist, numbers are real to you.
Well, current physical models suggest that the universe is some complicated wave function, parts of which can be factorized to produce objects, some of these objects (humans) run algorithms describing other objects and how they behave, and parts of these algorithms can be expressed as “properties of objects, their relationships, their dynamics...”
In the sense that the existence of God is. There is a lack of direct empirical evidence for the actual existence of numbers.
If maths is supposed to apply to the universe and doesn’t then, that is a problem. But most of it doesn’t apply to the universe, And it is physics that is supposed to apply to the universe.
And physics runs on math. If math didn’t work, then physics wouldn’t be able to run on it. The fact that the same results which you get when you perform a mathematical operation on purely abstract symbols also hold when you apply that mathematical operation to real, concrete things suggests that mathematics has some general effectiveness as a method for deriving true statements from other true statements.
We can do this consistently enough that when we fail to get predictive results from mathematical formulas in real life, we assume we’re using the wrong math. But if we consistently could not do this, then we’d be wise to doubt the effectiveness of mathematics as a method of deriving true statements from other true statements.
Maths works in the sense that it doesn’t generate contradictions,and if it didn’t work that way, it would be no good as a necessary ingredient of physics. But a necessary ingredient is not a sufficient ingredient. You don’t always get the same result from maths and physics, as the example of apples, inkblots and rabbits shows. And maths is still effective at deriving true statement s from other true statement, because that process has nothing to do with the real world. You can kill 3 werewolves with 3 silver bullets. That’s maths,but it’s not reality
Right. So the fact that it works as a necessary ingredient of physics is evidence of its general ability to derive true statements from other true statements.
Combining ink blobs and mating rabbits are not properly represented by the mathematical operation of addition, that’s simply an example of using the wrong math.
You asked me, several comments upthread, whether I thought the universe would be observably different if math didn’t work. The fact that not all possible mathematical statements reflect reality doesn’t get us around the answer being “yes.”
Which does not itself support your claim that it has the same epistemology as physical science.
But it’s not seen as a disproof of any mathematical claim....although it would be a disproof of a physical theory, if it mispredicted an outcome … because physics has an empirical epistemology and maths doesn’t.
Which was part of a wider point about whether maths has an empirical epistemology, which was part of a wider point about moral claims having a sui generis epistemology.
You expect reality to be different if deriving-a-true-statement-from-another-true-statement is different? But it is different. You can add or drop the Axiom of Choice. You can add or drop proof by Reductio (constructivism). Etc.
This is not a claim I ever made in the first place. The fact that we do not use empirical investigation to determine the truth of individual mathematical claims is irrelevant to my point.
There’s nothing wrong with moral data being generated by a different sort of process than we use to generate scientific data, if you can demonstrate that the process works in the first place.
Absent that kind of evidence, evidence we have in overabundance for math, you should make no such assumption.
Do we actually have any such proofs? You seem to think maths can be proven to work empoirically, but that seems to depend on cherry picking the right maths and the right physical objects.
We have an abundance of such evidence.
Can you give examples of math not working then? The fact that you can combine two blobs of ink to get one blob of ink has no bearing on the correctness of 1+1=2, because “+” is not an appropriate operator to represent mushing two objects together until they merge and then counting the merged object.
The fact that you get the wrong answers to your questions if you attempt to answer them with the wrong math is not relevant to the proposition that math works, just as the fact that using the wrong map will not help you navigate to your destination is irrelevant to the proposition that cartography works.
Do we really have an abundance? As far as I know, the non-circular vindication of popular forms of epistemollogical justification is a complex unsolved problem.
I can give examples of examples of correct maths that are also examples of incorrect maths. I can easily give examples of correct maths that are incorrect physics. “Working” has no single meaning here.
1+1=2 is a mathematical truth, period. If it doesn’t apply to inkdrops, that is a physics issue.
I amnot attempting to argue that maths does not work. I am arguing that it does not work like physics. It has a different epistemology.
It’s not that the equation doesn’t apply to inkdrops, it’s just that it doesn’t apply to doing that particular operation on inkdrops. 1+1=2 applies to inkdrops, as it does to any other physical objects, as long as you put them alongside each other and count them separately, rather than mashing them together and counting them as a unit. The operation of 1+1=2 never applies to mashing individual things together and counting how many of the mashed-together unit you have, although other qualities such as their masses will continue to be additive.
I am not arguing that mathematical truths are generated by the same epistemology as scientific data, but I am arguing that the entire edifice of math is supported by evidence, in the same manner that the edifice of the scientific method is supported by evidence, and that we do not likewise have evidence to support the edifice of an epistemology for generating objective moral truths.
What do you count as evidence in math?
You can go upwards in the tree of comments to see where I’ve discussed this already, but if you still find my meaning unclear I could answer whatever questions you might have to clarify my position.
OK. Does physical evidence of math “working” in physics count for “the entire edifice of math is supported by evidence”, or is it totally unrelated?
I would contend that it counts.
OK, thanks. I have looked through the thread again, and it looks like our views are too far apart for a forum exchange to be an adequate medium of discussing them. Or maybe I need to express them to myself clearer first.
That’s a difference that doesn’t make a difference. It remains the case that a mathematical truth is not automatically a physical truth.
It remains the case that the skill, activity, and concern of picking out the bits of maths that are relevant to a given physical situation is physics
Then what’s the difference?
And what difference is evidence making to maths? You have axioms, which are true by stipulation, and you have truth -preserving rules of inference. Logically, if you combine the two, you will generate true theorems. So where is the problem? Is the evidence supposed to remove doubt that the rules of inference are valid, or that the axioms are true?
The former. Some axioms will be applicable to reality and some will not, but the rules of inference work in either case.
Scientists tend to arrive at consensuses because they’re working with the same fundamental rules of inference and looking at the same data. Mathematicians arrive at consensuses on the consequences of given axioms, because they’re working with the same fundamental rules of inference. In both cases, we have evidence that these rules of inference actually work, which is not the case by default.
Even if we supposed that we had a system for moral reasoning which, like math, could extrapolate the consequences of axioms to their necessary conclusions (which I would argue we don’t actually have, humans are simply not good enough at natural language reasoning to pull that off with any sort of reliability,) how would we determine if any of those moral axioms apply to the real world, the way some mathematical axioms observably do?
And if some axioms are applicable to reality, then the whole edifice is supported? You mean, maths is like a rigid structure, where if one part supported, that in turn supports the other parts? That would be the case if you could logically infer axioms B and C from axiom A. But that is exactly how axioms don’t work. A set of axioms is a minimal set of assumptions: if one of your axioms is derivable from the others, it shouldn’t be in the set.
Maths doesn’t need empirical support to work as maths, since it isn’t physics, and adding empirical support doesn’t show “it works” in any exhaustive way, only that some bits of it are applicable to physical reality,
Whatever “actually works” means. If a true mathematical statement is one that is derived by standard rules of inference from standard axioms, then it could hardly fail to “work”, in that sense. OTOH, it can still fail to apply to reality, and so fail to work in that sense. (You really need to taboo “work”). But in the second case, evidence is only supporting the idea that some maths works, in the second sense. You make things rather easy by defining works to mean sometimes works...
The job of morality is to guide action, not to passively reflect facts, so the question is irrelevant.
The “standard rules of inference” could very easily simply be nonsense. Sets of rules do not help produce meaningful information by default. Out of the set of all methods of inference that could hypothetically exist, the vast majority do not produce information that is either true or consistent.
So let’s suppose that we live in a universe in which it is objectively true that there are no objective moral standards. In your view, would this be a relevant fact about morality?
In what sense? Are there alternative rules that are also truth preserving? Is truth-preservation hard to establish?
I wouldn’t say they produceinformation at all. Truth isn’t meaning, preservation isn’t production, truth preservation therefore isn’t meaning production.
We are supposed to have already chosen a set of rules that is truth-preserving.: ie, if you treat truth as a sort of numerical value, wihout any metaphysics behind it, that is assigned by fiat to your axioms, then you can show that it is preserved by using truth tables and the like (although there as some circularities there). Do you doubt that? Or is your concern showing that axioms are somehow “really” true, about something other than themselves?
It’s more metaethical. You cant use “there are not moral truths” to guide your actions at the object level..
If you count empiricism as a set of alternative rules which are truth preserving, then yes. If you’re talking about other non-empirical sets of rules, I would at least tentatively say no; formal logic is a branch of math, and natural language logic really doesn’t meet the same standards.
In that case, what if anything would you describe as the production of information?
What set of rules “we are supposed to have already chosen” are you referring to? Based on some of your statements about morality so far, I suspect I would argue that you are extrapolating from axioms that are not true in real life, much as some mathematical axioms seem to apply to the real world, while other ones do not.
Sure you can. Whether you should is an entirely separate value judgment.
If empiricism just means gathering observations, stamp collecting style, then i dont see where the truth preservation comes in. I can see where truth preservation comes into making predictions from evidence you already have, but that seems to use logical inference.
If you draw a line around an entity, then there will be information crossing that line, which is new, unpredictable, and “produced” as far as that entity is concerned.
I meant we are supposed to have already chosen a set of truth preserving mathematical and logical rules.
Could you give an example of how to use “there are no moral truths” to guide your actions?
Could you taboo “truth preservation?”
“Nothing is objectively right or wrong, therefore I will do whatever I feel like as long as I can get away with it.”
You or I might not consider this to be good moral behavior, but for a person who believed both that moral rules are only worth following if they’re objective, and that objective moral rules do not exist, it would be a reasonable conclusion to draw.
Who the problem with truthpreservation? It’s a technical term, and i gave a link
Your notion of using metaethical nihilism to guide action is analogous to treating atheism as a belief. Your actions would be unguided.
Atheism is, of course, a belief. It’s a belief that there are not any gods.
The belief that there are no objective moral rules only guides your actions to the extent that it may relieve you of constraints that you might have had if you thought there were any. But whether any other moral code is “better” than this would simply come down to a matter of value judgment.
...Gödel?
Which boils down to “math works for the situations in which it works”.
The math works regardless, it simply doesn’t apply to every situation.
A correct map of Michigan will remain correct whether you’re trying to find your way through Michigan or Taiwan.
Well, I have this thing, let’s call it htam, it says 1 + 1 = 1. Works for ink blobs perfectly well.
Actually, you know what, it works regardless, it simply doesn’t apply to every situation.
Applying maths is physics. Physical untruth can be mathematical truth and vice ver.sa.So they have different epistemologies.
I call it 2.
http://en.wikipedia.org/wiki/Two-element_Boolean_algebra
“math” is just a fancy name for “map”. Some maps do not represent any “real” places.
I’d like to see a detailed response to Ben Goertzel’s idea that a de novo AGI wouldn’t have certain types of goals because they’re stupid. I mean, I wouldn’t like to read it, because I can’t make sense of Goertzel’s arguments (if, in fact, they make any), but it’d be good to put it out there since he’s prominent for some reason.
Ask what is meant by “the right thing”.
Also, (and this may be an additional reason for wanting Friendliness), protecting humanity may not be the right thing in a larger context.
Isn’t acting maximally intelligently and correctly itself a motivation? The question you are really asking seems to be why an AI is supposed to act maximally intelligently and correctly to achieve world states that are not explicitly or implicitly defined to maximize expected utility. Yet the motivation to act maximally intelligently and correctly will always be given, otherwise you’re not talking about an rational agent.
To act maximally intelligently and correctly could quite easily be an instruction to convert the observable universe into a computer.
You could have a very intellligent agent that acts as though it is completely nuts.
The problem then becomes: you do you know it is intelligent—if giving it intelligence tests no longer works.
However, I think this is a bit of a side-issue.
You put it in a variety of environments and see if they tend to look similar after a while. It’s easier if you have a goal to test against, but as long as it’s optimizing some utility function in a variety of environments, it’s intelligent.
The problem arises when it isn’t doing that. Say you tell the superintelligence to sit still and do nothing. It’s a meditating superintelligence—but you can’t easily determine that until after it has stopped meditating.
Any kind of agent could—in principle—be engineered.
However, some sorts of agent are more likely to evolve than others—and it is this case that actually matters to us.
For example, intelligent machines are likely to coevolve in a symbiosis with humans—during which they will pick up some of our values. In this case, intelligence and values will be powerfully linked—since stupid machines will fail to absorb so many of our values—as we have seen, for example, with the evolution of cars.
So: The Orthogonality Thesis:
...is true[*] - but the “in principle” renders it kind-of irrelevant to the case that we actually care about.
* Unless the wirehead / pornography problems turn out to actually be serious issues.
I have doubts that it is even true “in principle” unless the goals are hard-wired in and unmodifiable by the intelligence. Do you really think that someone would agree to be OCD or schizophrenic if they had a choice? For higher levels of intelligence, I would think they would be even more discriminating as to goal-states they would accept.
As for the argument by thelittledoctor, the evil genius dictator model is broken even for highly intelligent humans, much less super-intelligences. Those “intelligent” demagogues are rarely, if ever, more than 2 standard deviations above average human intelligence, that definitely doesn’t count as “highly intelligent” as far as I’m concerned.
It seems irrelevant whether the AI is quote-unquote “highly intelligent” as long as it’s clever enough to take over a country and kill several million people.
The usual argument is that we are likely to be able to build machines that won’t want to modify their goals.
IMO, the more pressing issue with something like OCD is that it might interfere with intelligence tests—in which case you could argue that an OCD superintelligent machine is not really intelligent—since it is using its intelligence to screw itself.
This seems to be a corner case to me. The intended point is more that you could engineer an evil genius, or an autistic mind child.
My argument: The reason the orthogonal hypothesis is false is that 100% (full) general intelligence is not possible without 100% (perfect) moral motivations. That is to say, any AI that is not completely moral is limited to some degree.
Expert Counter-argument: But the Marcus Hutter AIXI model shows that it’s possible to have optimal (fully general intelligence) decision makers with any arbitrary goals.
My reply: The AIXI model of Hutter is not a true general intelligence, because the initial goals are fixed (i.e it cannot reflect upon and modify its own goals). Further, the AIXI model is not pratical in real-time due to requirements for infinite (or impractically high) computational resources. You need correct priors for accurate real-time reasoning.
Expert Counter-argument: What have priors got to do with morality? Further, the problem of priors has been solved. The Kolmogorov complexity prior is a precise formalization of Occam’s razor.
My reply: The Kolmogorov complexity prior is uncomputable
Expert Counter-argument: There are perfectly workable approximations! For instance, the Schmidhuber speed prior and methods such as Monte Carlo methods can be deployed.
My reply: There is no proof that such approximations can be scaled up from their success in limited domains to full general intelligence.
Expert Counter-argument: The problem of priors isn’t relevant. The initial priors ‘wash out’ with enough evidence. The convergence theorems show this.
My reply: In fact, there is no convergence for different initial Bayesian models. Further, the computational explosion with the increase in problem complexity results in failure of Occam approximations to scale.
Expert Counter-argument: Even if this is true (which is highly debatable), what have priors got to do with morality any way?
My reply: Occam’s razor is the smoking gun which establishes a link between priors and value judgements. Fully general intelligence requires a formal notion of Occam razor. But Occam priors are equivalent to an aesthetic notion of what constitutes ‘simplicity’. Schmidhuber argued that the notion of ‘beauty’ is tied into the ability to compress information. But if this is true, then intelligence cannot be independent of aesthetic/value judgements.
Expert Counter-argument: This sounds like a weak idea to me. I don’t see the link between aesthetic judgements and Occam priors. As to Schmidhuber’s attempt to tie a notion of ‘beauty’ to data compression, didn’t Rayhawk critique the idea on the grounds that Schmidhuber’s notion of beauty isn’t actually required for an optimal decision making agent? And even if the general idea was correct, why should this notion of ‘beauty’ correspond to what humans value any way?
My reply: Occam priors could be approximated by deploying categorization and analogical inference...grouping things into categories to compress information is equivalent to the introduction of inductive biases, which already looks human-like. Kant argued that intelligent minds require a set of universal a-priori categories and ontological primitives look like the modern version of Kant’s categories.
Expert Counter-argument: This is just vague hand-waving! Exactly how would your analogical inference/categorization scheme work?
My reply: A small set of initial ontological primitives (defined in terms of prototypes rather than sufficient conditions) would be built-into the seed-AI. These would serve as reference classes enabling calculation of the semantic distances between features of concepts in feature-space. This would define the basic notion of ‘similarity’ needed for analogical inference, which would involve cross-domain mappings between ontologies and allow fully general (cross-domain) reasoning.
Expert Counter-argument: OK, although I’m not sure you know what you’re talking about, and I still don’t see the connection to morality
My reply: Here’s the connection to morality: the ontology would be used to form representations of our values in the form of decision-making narratives. To compress the representations of values into these coherent narratives, aesthetic and ethical value judgments would be needed. This whole system would be defined in terms of information theory.
Expert Counter-argument: Even if there was a ‘natural’ notion of morality that emerges from all this, which sounds just like wild speculation to me, why should I (or an AI) want to follow it anyway?
My reply: Deviations from correct compression techniques degrade representations of decision-making narratives, thus degrading general reasoning abilities. Since correct compression techniques are tied to aesthetic/value judgements, any deviation from perfect morality would reduce the effective intelligence of the AI. You cannot reason that you should deviate from perfect morality without contradicting yourself and degrading your own cognitive processes! So knowledge of universal morality would automatically be self-motivating.
Expert Counter-argument: Sounds unlikely to me, besides, as Bostrom pointed out, even if there was a universal morality, the knowledge of which is self-motivating, the AI need not be built in the way you described, in which case it wouldn’t care about correct reasoning as you have defined it. Pick a random mind from mind-space, it doesn’t care about humans.
My reply: Most programs picked randomly from mind-space don’t work. UAIs would not work properly. Most of the AIs picked randomly from mind-space just won’t work. I’m not claiming intelligence implies morality, I’m claiming the converse of this (i.e ‘morality implies intelligence’). I’m not saying UAIs can’t exist, I’m saying they won’t be true general intelligences. I’m saying they won’t be able to self-improve to the super-intelligent level.
Expert: Let me think this further for while.....
Expert: define your terms, justifying why you apply the label ‘moral’ to the particular motivations you have in mind, then present evidence for your thesis.
I think this argument-complex is stronger than the AI risk folk admit (and I know how to strengthen your argument at various points). A plausible off-beat counter is that humans have been getting less moral (and their aesthetic tastes have gotten worse) over time despite their getting closer to building AI—what you see in history is might consistently rewriting the rules of morality to make right, and predicting this continued trend, while accurate in some descriptive sense (e.g. if you take people’s values at face value as defining morality), it doesn’t seem like sound moral philosophy. In this sense a singularity would be like a glorious communist revolution—seemingly inevitable, seemingly the logical endpoint of morality, yet in fact incredibly destructive both physically and culturally. The problem with AI is that, even if in the limit intelligence and morality (might and right) are the same thing, it seems like an AI would be able to set up the equivalent of a communist dictatorship and hold on to it for as long as it takes a black hole to evaporate. And even if the new communist dictatorship were better than what came before it, it still seems like we have a shot to ensure that AI will jump straight to 100% intelligence and 100% morality without getting caught up somewhere along the way. But of course, even the communist dictatorship scenario isn’t really compatible with the orthogonality thesis...