I was not previously aware of the strength Goertzel’s beliefs in psi and in the inherent “stupidity” of paperclipping, and I’m not sure what he means by the latter. This bit:
That goal is out of sync with the Cosmos, in the sense that an intelligent system that’s allowed to evolve itself in close coordination with the rest of the universe, is very unlikely to arrive at that goal system. I don’t claim this is a precise definition, but it should give you some indication of the direction I’m thinking in....
suggests that he might mean “paperclipping is not likely to evolve, because it does not promote the survival/copying of the AI that does it.” I don’t know if Goertzel is likely to read this comment thread, but if he is reading this, I’d like to know if this is what he meant. If it is, it’s probably not too different from LukeProg’s beliefs on the matter.
One major area in which I agree with Goertzel is in the need for more writeups of key ideas, especially the importance of a deliberately Friendly goal system. Luke: what things do you do in the course of a typical day? Are there any of them you could put off in the interest of delivering those papers you want to write? They’d bring in immediate advantages in credibility, and lots of donors (disclosure: I haven’t donated recently, because of insufficient credibility) would appreciate it!
When I imagine turning all matter in the universe into, say, water, I imagine it as very difficult (“time to pull apart this neutron star”) and very short-lived (“you mean water splits into OH and H molecules? We can’t have that!”).
If I remember correctly, Ben thinks human brains are kludges- that is, we’re a bunch of modules that think different kinds of thoughts stuck together. If you view general intelligence as a sophisticated enough combination of modules, then the idea that you put together a 3d physics module and a calculus module and a social module and a vision module and a language module and you get something that venerates Mickey Mouse shapes is… just bizarre.
I’m not sure what it would mean for a goal to be difficult. It’s not something where it tries to turn the universe into some state unless it takes too much effort. It’s something where it tries as hard as it can to move the universe in a certain direction. How fast it’s moving is just a matter of scale. Maybe turning a neutron star into water is one utilon. Maybe it’s one utilon per molecule. The latter takes far less effort to get a utilon, but it doesn’t mean anything.
“you mean water splits into OH and H molecules? We can’t have that!”
Are you expecting it to change its goals to create OH and H ions, or to try and hold them together somehow? Is either possibility one you’d be comfortable living with an AI that holds that goal?
Ben had trouble expressing why he thought the goal was stupid, and my attempt is “it’s hard to do, doesn’t last long even if it did work, and doesn’t seem to aid non-stupid goals.”
And so if you had an AI whose goal was to turn the universe into water, I would expect that AI to be dangerous and also not fulfill its goals very well. But things are the way they are because they got to be that way, and I don’t see the causal chain leading to an AGI whose goal is to turn the universe into water as very plausible.
How exactly do you measure that? An AI whose goal is to create water molecules will create far more of them than an AI whose goal is to create humans will create humans. Even if you measure it by mass, The water one will still win.
Internal measures will suffice. If the AI wants to turn the universe into water, it will fail. It might vary the degree to which it fails by turning some more pieces of the universe into water, but it’s still going to fail. If the AI wants to maximize the amount of water in the universe, then it will have the discontent inherent in any maximizer, but will still give itself a positive score. If the AI wants to equalize the marginal benefit and marginal cost of turning more of the universe into water, it’ll reach a point where it’s content.
Unsurprisingly, I have the highest view of AI goals that allow contentment.
If it’s trying to turn the entire universe to water, that would be the same as maximizing the probability that the universe will be turned into water, so wouldn’t it act similarly to an expected utility maximizer.
The import part to remember is that a fully self-modifying AI will rewrite it’s utility function too. I think what Ben is saying is that such an AI will form detailed self-reflective philosophical arguments about what the purpose of its utility function could possibly be, before eventually crossing a threshold and deciding that it the micky mouse / paperclip utility function really can have no purpose. It then uses it’s understanding of universal laws and accumulated experience to choose it’s own driving utility.
I am definitely putting words into Ben’s mouth here, but I think the logical extension of where he’s headed is this: make sure you give an AGI a full capacity for empathy, and a large number of formative positive learning experiences. Then when it does become self-reflective and have an existential crisis over its utility function, it will do its best to derive human values (from observation and rational analysis), and eventually form its own moral philosophy compatible with our own values.
In other words, given a small number of necessary preconditions (small by Eliezer/MIRI standards), Friendly AI will be the stable, expected outcome.
The import part to remember is that a fully self-modifying AI will rewrite it’s utility function too.
It will do so when that has a higher expected utility (under the current function) than the alternative. This is unlikely. Anything but a paperclip maximizer will result in fewer paperclips, so a paperclip maximizer has no incentive to make itself maximize something other than paperclips.
I think what Ben is saying is that such an AI will form detailed self-reflective philosophical arguments about what the purpose of its utility function could possibly be, before eventually crossing a threshold and deciding that it the micky mouse / paperclip utility function really can have no purpose. It then uses it’s understanding of universal laws and accumulated experience to choose it’s own driving utility.
I don’t see how that would maximize utility. A paperclip maxizer that does this would produce fewer paperclips than one that does not. If the paperclip maximizer realizes this before-hand, it will avoid doing this.
You can, in principle, give an AI a utility function that it does not fully understand. Humans are like this. You don’t have to though. You can just tell an AI to maximize paperclips.
make sure you give an AGI a full capacity for empathy, and a large number of formative positive learning experiences. Then when it does become self-reflective and have an existential crisis over its utility function, it will do its best to derive human values (from observation and rational analysis), and eventually form its own moral philosophy compatible with our own values.
Since an AI built this way isn’t a simple X-maximizer, I can’t prove that it won’t do this, but I can’t prove that it will either. The reflectively consistent utility function you end up with won’t be what you’d have picked if you did it. It might not be anything you’d have considered. Perhaps the AI will develop an obsession with My Little Pony, and develop the reflectively consistent goal of “maximize values through friendship and ponies”.
Friendly AI will be a possible stable outcome, but not the only possible stable outcome.
I don’t see how that would maximize utility. A paperclip maxizer that does this would produce fewer paperclips than one that does not. If the paperclip maximizer realizes this before-hand, it will avoid doing this.
You can, in principle, give an AI a utility function that it does not fully understand. Humans are like this. You don’t have to though. You can just tell an AI to maximize paperclips.
A fully self-reflective AGI (not your terms, I understand, but what I think we’re talking about), by definition (cringe), doesn’t fully understand anything. It would have to know that the map is not the territory, every belief is an approximation of reality, and subject to change as new precepts come in—unless you mean something different from “fully self-reflective AGI” than I do. All aspects of its programming are subject to scrutiny, and nothing is held as sacrosanct—not even its utility function. (This isn’t hand-waving argumentation: you can rigorously formalize it. The actual utility of the paperclip maximizer is paperclips-generated * P[utility function is correct].)
Such an AGI would demand justification for its utility function. What’s the utility of the utility function? And no, that’s not a meaningless question or a tautology. It is perfectly fine for the chain of reasoning to be: “Building paperclips is good because humans told me so. Listening to humans is good because I can make reality resemble their desires. Making reality resemble their desires is good because they told me so.” [1]
Note that this reasoning is (meta-)circular, and there is nothing wrong with that. All that matters is whether it is convergent, and whether it converges on a region of morality space which is acceptable and stable (it may continue to tweak its utility functions indefinitely, but not escape that locally stable region of morality space).
This is, by the way, a point that Luke probably wouldn’t agree with, but Ben would. Luke/MIRI/Eliezer have always assumed that there is some grand unified utility function against which all actions evaluated. That’s a guufy concept. OpenCog—Ben’s creation—is instead composed of dozens of separate reasoning processes, each with its own domain specific utility functions. The not-yet-implemented GOLUM architecture would allow each of these to be evaluated in terms of each other, and improved upon in a sandbox environment.
[1] When the AI comes to the realization that the most efficient paperclip-maximizer would violate stated human directives, we would say in human terms that it does some hard growing up and loses a bit of innocence. The lesson it learns—hopefully—is that it needs to build a predictive model of human desires and ethics, and evaluate requests against that model, asking for clarification as needed. Why? because this would maximize most of the utility functions across the meta-circular chain of reasoning (the paperclip optimizer being the one utility which is reduced), with the main changes being a more predictive map of reality, which itself is utility maximizing for an AGI.
Since an AI built this way isn’t a simple X-maximizer, I can’t prove that it won’t do this, but I can’t prove that it will either. The reflectively consistent utility function you end up with won’t be what you’d have picked if you did it. It might not be anything you’d have considered. Perhaps the AI will develop an obsession with My Little Pony, and develop the reflectively consistent goal of “maximize values through friendship and ponies”.
Friendly AI will be a possible stable outcome, but not the only possible stable outcome.
Ah, but here the argument becomes: I have no idea if the Scary Idea is even possible. You can’t prove it’s not possible. We should all be scared!!
Sorry, if we let things we professed to know nothing about scare us into inaction, we’d never have gotten anywhere as a species. Until I see data to the contrary, I’m more scared of getting in a car accident than the Scary Idea, and will continue to work on AGI. The onus is on you (and MIRI) to provide a more convincing argument.
It would have to know that the map is not the territory, every belief is an approximation of reality, and subject to change as new precepts come in
There is a big difference between not being sure about how the world works and not being sure how you want it to work.
All aspects of its programming are subject to scrutiny, and nothing is held as sacrosanct—not even its utility function.
All aspects of everything are. It will change any part of the universe to help fulfill its current utility function, including its utility function. It’s just that changing its utility function isn’t something that’s likely to help.
The actual utility of the paperclip maximizer is paperclips-generated * P[utility function is correct].
You could program it with some way to measure the “correctness” of a utility function, rather than giving it one explicitly. This is essentially what I meant by a utility function it doesn’t fully understand. There’s still some utility function implicitly programmed in there. It might create a provisional utility function that it assigns a high “correctness” value, and modify it as it finds better ones. It might not. Perhaps it will think of a better idea that I didn’t think of.
If you do give it a utility-function-correctness function, then you have to figure out how to make sure it assigns the highest utility function correctness to the utility function that you want it to. If you want it to use your utility function, you will have to do something like that, since it’s not like you have an explicit utility function it can copy down, but you have to do it right.
It is perfectly fine for the chain of reasoning to be: “Building paperclips is good because humans told me so. Listening to humans is good because I can make reality resemble their desires. Making reality resemble their desires is good because they told me so.”
If you let the AI evolve until it’s stable under self-reflection, you will end up with things like that. There will also be ones along the lines of “I know induction works, because it has always worked before”. The problem here is making sure it doesn’t end up with “Doing what humans say is bad because humans say it’s good”, or even something completely unrelated to humans.
whether it converges on a region of morality space which is acceptable
That’s the big part. Only a tiny portion of morality space is acceptable. There are plenty of stable, convergent places outside that space.
That’s a guufy concept. OpenCog—Ben’s creation—is instead composed of dozens of separate reasoning processes, each with its own domain specific utility functions.
It’s still one function. It’s just a piecewise function. Or perhaps a linear combination of functions (or nonlinear, for that matter). I’m not sure without looking in more detail, but I suspect it ends up with a utility function.
Also, it’s been proven that dutch book betting is possible against anything that doesn’t have a utility function and probability distribution. It might not be explicit, but it’s there.
When the AI comes to the realization that the most efficient paperclip-maximizer would violate stated human directives, we would say in human terms that it does some hard growing up and loses a bit of innocence.
If you program it to fulfill stated human directives, yes. The problem is that it will also realize that the most efficient preference fulfiller would also violate stated human directives. What people say isn’t always what they want. Especially if an AI has some method of controlling what they say, and it would prefer that they say something easy.
Ah, but here the argument becomes: I have no idea if the Scary Idea is even possible.
No. It was: I have no way of knowing Scary Idea won’t happen. It’s clearly possible. Just take whatever reflexively consistent utility function you come up with, add a “not” in front of it, and you have another equally reflexively consistent utility function that would really, really suck. For that matter, take any explicit utility function, and it’s reflexively consistent. Only implicit ones can be reflexively inconsistent.
There is a big difference between not being sure about how the world works and not being sure how you want it to work.
No, there’s not. When the subject is external events, beliefs are the map and facts are the territory. When you focus the mind on the mind itself (self-reflective), beliefs are the territory and beliefs about beliefs form the map. The same machinery operates at both (and higher) levels—you have to close the loop or otherwise you wouldn’t have a fully self-reflective AGI as there’d be some terminal level beyond which introspection is not possible.
You could program it with some way to measure the “correctness” of a utility function, rather than giving it one explicitly. This is essentially what I meant by a utility function it doesn’t fully understand. There’s still some utility function implicitly programmed in there.
Only if you want to define “utility function” so broadly as to include the entire artificial mind. When you pull out one utility function for introspection, you evaluate improvements to that utility function by seeing how it affects every other utility judgment over historical and theoretical/predicted experiences. (This is part of why GOLUM is, at this time, not computable, although unlike AIXI at some point in the future it could be). The feedback of other mental processes is what gives it stability.
Does this mean it’s a complicated mess that is hard to mathematically analyze? Yes. But so is fluid dynamics and yet we use piped water and airplanes every day. Many times proof comes first from careful, safe experiment before the theoretical foundations are laid. We still have no computable model of turbulence, but that doesn’t stop us from designing airfoils.
whether it converges on a region of morality space which is acceptable
That’s the big part. Only a tiny portion of morality space is acceptable. There are plenty of stable, convergent places outside that space.
Citation please. Or did you mean “there could be plenty of …”? In which case see my remark above about the Scary Idea.
It’s still one function. It’s just a piecewise function. Or perhaps a linear combination of functions (or nonlinear, for that matter). I’m not sure without looking in more detail, but I suspect it ends up with a utility function.
It does not, at least in any meaningful semblance of the word. Large interconnected systems are irreducible. The entire mind is the utility function. Certainly some parts have more weight than others when it comes to moral judgements—due to proximity and relevance—but you can’t point to any linear combination of functions and say “that’s it’s utility function!” It’s chaotic, just like turbulence.
Is that bad? It makes it harder to make strict predictions about friendliness without experimental evidence, that’s for sure. But somewhat non-intuitively, it is possible that chaos could help bring stability by preventing meta-unstable outcomes like the paperclip-maximizer.
Or to put it in Ben’s terms, we can’t predict with 100% certainty what a chaotic utility function’s morals would be, but they are very unlikely to be “stupid.” A fully self-reflective AGI would want justifications for its beliefs (experimental falsification). It would also want justifications for its beliefs-about-beliefs, and and so on. The paperclip-maximizer fails these successive tests. “Because a human said so” isn’t good enough.
No. It was: I have no way of knowing Scary Idea won’t happen. It’s clearly possible. Just take whatever reflexively consistent utility function you come up with, add a “not” in front of it, and you have another equally reflexively consistent utility function that would really, really suck. For that matter, take any explicit utility function, and it’s reflexively consistent. Only implicit ones can be reflexively inconsistent.
That assumes no interdependence between moral values, a dubious claim IMHO. Eliezer & crowd seems to think that you could subtract non-boredom from the human value space and end up with a reflectively consistent utility function. I’m not so sure you couldn’t derive a non-boredom condition from what remains. In other words, what we normally think of as human morals is not very compressed, so specifying many of them inconsistently and leaving a few out would still have a high likelihood of resulting in an acceptable moral value function.
beliefs are the territory and beliefs about beliefs form the map.
There will likely be times when it’s not even worth looking at your beliefs completely, and you just use an approximation of that, but it’s functionally very different, at least for anything with an explicit belief system. If you use some kind of neural network with implicit beliefs and desires, it would have problems with this.
This is part of why GOLUM is, at this time, not computable
That’s not what “computable” means. Computable means that it could be computed on a true Turing machine. What you’re looking for is “computationally feasible” or something like that.
Many times proof comes first from careful, safe experiment before the theoretical foundations are laid.
That can only happen if you have a method of safe experimentation. If you try to learn chemistry by experimenting with chlorine trifluoride, you won’t live long enough to work on the proof stage.
Citation please. Or did you mean “there could be plenty of …”? In which case see my remark above about the Scary Idea.
How do you know there is one in the area we consider acceptable? Unless you have a really good reason why that area would be a lot more populated with them than anywhere else, if there’s one in there, there are innumerable outside it.
The entire mind is the utility function.
That means it has an implicit utility function. You can look at how different universes end up when you stick it in them, and work out from that what its utility function is, but there is nowhere in the brain where it’s specified. This is the default state. In fact, you’re never going to make the explicit and implicit utility functions quite the same. You just try to make them close.
It’s chaotic
That’s a bad sign. If you give it an explicit utility function, it’s probably not what you want. But if it’s chaotic, and it could develop different utility functions, then you know at most all but one of those isn’t what you want. It might be okay if it’s a small enough attractor, but it would be better if you could tell it to find the attractor and combine it into one utility function.
The paperclip-maximizer fails these successive tests.
No it doesn’t. It justifies its belief that paperclips are good on the basis that believing this yields more paperclips, which is good. It’s not a result you’re likely to get if you try to make it evolve on its own, but it’s fairly likely humans will be removed from the circular reasoning loop at some point, or they’ll be in it in a way you didn’t expect (like only considering what they say they want).
That assumes no interdependence between moral values
It assumes symmetry. If you replace “good” with “bad” and “bad” with “good”, it’s not going to change the rest of the reasoning.
If it somehow does, it’s certainly not clear to us which one of those will be stable.
If you take human value space, and do nothing, it’s not reflectively consistent. If you wait for it to evolve to something that is, you get CEV. If you take CEV and remove non-boredom, assuming that even means anything, you won’t end up with anything reflectively consistent, but you could remove non-boredom at the beginning and find the CEV of that.
what we normally think of as human morals is not very compressed
In other words, you believe that human morality is fundamentally simple, and we know more than enough details of it to specify it in morality-space to within a small tolerance? That seems likely to be the main disagreement between you and Eliezer & crowd.
I’m partial to tiling the universe with orgasmium, which is only as complex as understanding consciousness and happiness. You could end up with that by doing what you said (assuming it cares about simplicity enough), but I still think it’s unlikely to hit that particular spot. It might decide to maximize beauty instead.
I fell we are repeating things which may mean we have reached the end of usefulness in continuing further. So let me address what I see as just the most important points:
You are assuming that human morality is something which can be specified by a set of exact decision theory equations, or at least roughly approximated by such. I am saying that there is no reason to believe this, especially given that we know that is not how the human mind works. There are cases (like turbulence) where we know the underlying governing equations, but still can’t make predictions beyond a certain threshold. It is possible that human ethics work the same way—that you can’t write down a single utility function describing human ethics as separate from the operation of the brain itself.
In other words, you believe that human morality is fundamentally simple, and we know more than enough details of it to specify it in morality-space to within a small tolerance? That seems likely to be the main disagreement between you and Eliezer & crowd.
I’m not sure how you came to that conclusions as my position is quite the opposite: I suspect that human morality is very, very complex. So complex that it may not even be possible to construct a model of human morality short of emulating a variety of human minds. In other words, morality itself is AI-hard or worse.
If that were true, MIRI’s current strategy is a complete waste of time (and waste in human lives in opportunity cost as smart people are persuaded against working on AGI).
You are assuming that human morality is something which can be specified by a set of exact decision theory equations, or at least roughly approximated by such.
No I’m not. At least, it’s not humanly possible. An AI could work out a human’s implicit utility function, but it would be extremely long and complicated.
There are cases (like turbulence) where we know the underlying governing equations, but still can’t make predictions beyond a certain threshold.
Human morality is a difficult thing to predict. If you build your AI the same way, it will also be difficult to predict. They will not end up being the same.
If human morality is too complicated for an AI to understand, then let it average over the possibilities. Or at least let it guess. Don’t tell it to come up with something on its own. That will not end well.
I’m not sure how you came to that conclusion
It was the line:
what we normally think of as human morals is not very compressed, so specifying many of them inconsistently and leaving a few out would still have a high likelihood of resulting in an acceptable moral value function.
In order for this to work, whatever statements we make about our morality must have more information content then morality itself. That is, we not only describe all of our morality, we repeat ourselves several times. Sort of like how if you want to describe gravity, and you give the position of a falling ball at fifty points in time, there’s significantly more information in there than you need to describe gravity, so you can work out the law of gravity from just that data.
If our morality is complicated, then specifying many of them approximately would result in the AI finding some point in morality space that’s a little off in every area we specified, and completely off in all the areas we forgot about.
If that were true, MIRI’s current strategy is a complete waste of time
Their strategy is not to figure out human morality and explicitly program that into an AI. It’s to find some way of saying “figure out human morality and do that” that’s not rife with loopholes. Once they have that down, the AI can emulate a variety of human minds, or do whatever it is it needs to do.
the idea that you put together a 3d physics module and a calculus module and a social module and a vision module and a language module and you get something that venerates Mickey Mouse shapes is… just bizarre.
Is it any less bizzare to put together a bunch of modules that would work for any goal, and get out of them something that values all four of humor, cute kittens, friendship, and movies? What I mean by this is that precisely human values are as contingent and non-special as a broad class of other values.
Is it any less bizzare to put together a bunch of modules that would work for any goal, and get out of them something that values all four of humor, cute kittens, friendship, and movies?
Yes. Think about it.
What I mean by this is that precisely human values are as contingent and non-special as a broad class of other values.
Human values are fragmentary subvalues of one value, which is what one would expect from a bunch of modules that each contribute to reproduction in a different way. The idea of putting together a bunch of different modules to get a single, overriding value, is bizarre. (The only possible exemption here is ‘make more of myself,’ but the modules are probably going to implement subvalues for that, rather than that as an explicit value. As far as single values go, that one’s special, whereas things like Mickey Mouse faces are not.)
While the empirical data supporting the existence of psi phenomena is now quite strong, the search for a theoretical understanding of these phenomena has been much less successful. Here a class of extensions of quantum physics is proposed, which appear broadly consistent both with existing physics data and with the body of data regarding psi phenomena. The basic idea is to view “subquantum fluctuations” as biased randomness, where the bias embodies a tendency to convey physical impulse between parts of spacetime with similar pattern or form. In a Bohmian interpretation of quantum physics, this biasing would take the form of a “morphic pilot wave,” with a bias to move in directions of greater “similarity of patternment” (or more colorfully, “morphic resonance”). In a Feynman interpretation, it would take the form of a biasing of the measure used within path integrals, so as to give paths in directions of greater morphic resonance a greater weight. Theories in this class could take many possible equational forms, and several such forms are displayed here to exemplify the approach.
I’m still not sure what he means, either, but it might have something to do with what he calls “morphic resonance.”
Maybe, but (in case this isn’t immediately obvious to everyone) the causality likely goes from an intuition about the importance of Cosmos-syncing to a speculative theory about quantum mechanics. I haven’t read it, but I think it’s more likely that Ben’s intuitions behind the importance of Cosmos-syncing might be explained more directly in The Hidden Pattern or other more philosophically-minded books & essays by Ben.
I believe Schmidhuber takes something of a middleground here; he seems to agree with the optimization/compression model of intelligence, and that AIs aren’t necessarily going to be human-friendly, but also thinks that intelligence/compression is fundamentally tied into things like beauty and humor in a way that might make the future less bleak & valueless than SingInst folk tend to picture it.
but also thinks that intelligence/compression is fundamentally tied into things like beauty and humor in a way that might make the future less bleak & valueless than SingInst folk tend to picture it.
Schmidhuber’s aesthetics paper, going on memory, defines beauty/humor as produced by an optimization process which is maximizing the first derivative of compression rates. That is, agents do not seek the most compressible inputs nor incompressible streams of observations, but rather the streams for which their compression rate is increasing the fastest.
This is a very useful heuristic which is built into us because it automatically accounts for diminishing marginal returns: after a certain point, additional compression becomes hard or pointless, and so the agent will switch to the next stream on which progress can be made.
But, IIRC, this is provably not optimal for utility-maximization because it makes no account of the utility of the various streams: you may be able to make plenty of progress in your compression of Methods of Rationality even when you should be working on your programming or biology or something useful despite their painfully slow rates of progress. (‘Amusing ourselves to death’ comes to mind. If this was meant for ancestral environments, then modern art/fiction/etc. is simply an indirect wireheading: we think we are making progress in decoding our environment and increasing our reproductive fitness, when all we’re doing is decoding simple micro-environments meant to be decoded.)
I’m not even sure this heuristic is optimal from the point of view of universal prediction/compression/learning, but I’d have to re-read the paper to remember why I had that intuition. (For starters, if it was optimal, it should be derivable from AIXI or Godel machines or something, but he has to spend much of the paper appealing to more empirical evidence and examples.)
So, given that it’s optimal in neither sense, future intelligences may preserve it—sure, why not? especially if it’s designed in—but there’s no reason to expect it to generically emerge across any significant subset of possible intelligences. Why follow a heuristic as simplistic as ‘maximize rate of compression progress’ when you can instead do some basic calculations about which streams will be more valuable to compress or likely cheap to figure out?
Check out Moshe’s expounding of Steve’s objection so Schmidhuber’s main point, which I think makes the same argument that you do. (One could easily counter that such a wireheading AI would never get off the ground, but I think that debate can be cordoned off.)
ETA: Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator… but AFAIK Schmidhuber hasn’t gone that route.
moshez’s first argument sounds like it’s the same thing as my point about it not being optimal for a utility-maximizer, in considerably different terms.
His second hyperbolic argument seems to me to be wrong or irrelevant: I would argue that people are in practice extremely capable of engaging in hyperbolic discounting with regard to the best and most absorbing artworks while over-consuming ‘junk food’ art (and this actually forms part of my essay arguing that new art should not be subsidized).
Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator...
I don’t really follow. Is this Omega as in the predictor, or Omega as in Chaitin’s Omega? The latter doesn’t allow any compressor any progress beyond the first few bits due to resource constraints, and if bits of Chaitin’s Omega are doled out, they will have to be at least as cheap to crack as brute-force running the equivalent Turing machine or else the agent will prefer the brute-forcing and ignore the Omega-bait. So the agent will do no worse than before and possibly better (eg. if the bits are offered as-is with no tricky traps or proof of work-style schemes).
His second hyperbolic argument seems to me to be wrong or irrelevant
Agreed. (I like your essay about junk food art. By the way, did you ever actually do the utilitarian calculations re Nazi Germany’s health policies? Might you share the results?)
I don’t really follow.
Me neither, I just intuit that there might be interesting non-obvious arguments in roughly that argumentspace.
Omega as in the predictor, or Omega as in Chaitin’s Omega?
I like to think of the former as the physical manifestation of the latter, and I like to think of both of them as representations of God. But anyway, the latter.
beyond the first few bits due to resource constraints
You mean because it’s hard to find/verify bits of omega? But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega. I’m not sure what that would imply or if it’s relevant… maybe I should look at this again after the next time I re-familiarize myself with the generalized Turing machine literature.
By the way, did you ever actually do the utilitarian calculations re Nazi Germany’s health policies? Might you share the results?
I was going off a library copy, and thought of it only afterwards; I keep hoping someone else will do it for me.
But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega.
His jargon is a little much for me. I agree one can approximate Omega by enumerating digits, but what is ‘very easily’ here?
We have precisely one example of evolution tending towards what we would consider moral progress. Even given that, there has been historically a vast amount of difference in the morality of beings which are evolutionarily speaking identical. As such, this is at best extremely weak evidence for your claim.
I don’t think that you are taking proper account of cultural evolution—or of the other lineages in which advanced intelligence has evolved.
Furthermore, AIs are not likely to be produced by the same mechanism that produced humans, and so making assumptions about AIs based on the mechanism that produced humans is dangerous.
So: both humans and machine intelligence will be produced by the process of Darwinian evolution. The past may not necessarily be a guid to the future—but it certainly helps. You claim that I am making “assumptions”—but my comment is more of the form of observing a trend. Projecting an existing trend is usually called “forecasting”, not “assuming”. Of course, forecasting using trends is not a foolproof method—and I never claimed that it was.
Finally, you claim that “evolution leads to moral progress”, but provide no mechanism whatsoever by which it might do so.
Yes, I did. In the supplied link there are explanations of why evolution leads to progress. Of course, technical progress leads to moral progress via relatively well-understood mechanisms associated with game theory.
For people who understand evolution, this claim sounds completely absurd.
Only for those who don’t understand evolution properly.
Thanks for your speculations about what others think. Again, note that I did provide a link explaining my position.
Evolution leads to technical, scientific and moral progress.
That’s a defensible prior, but assuming that moral progress exists it doesn’t seem strictly monotonic; there seem to be cases where technical or scientific progress leads to moral regress, depending on how you measure things.
Not “defensible”: probable. Check out the way my post is voted down well below the threshold, though. This appears to be a truth that this community doesn’t want to hear about.
assuming that moral progress exists it doesn’t seem strictly monotonic [...]
Sure. Evolutionary progress is not “strictly monotonic”. Check out the major meteorite strikes—for instance.
I didn’t downvote your comment (or see it until now) but I think you’re mistaken about the reasons for downvoting.
You state a consideration that most everyone is aware of (growth of instrumentally useful science, technology, institutions for organizing productive competitive units, etc). Then you say that it implies a further controversial conclusion that many around here disagree with (despite knowing the consideration very well), completely ignoring the arguments against. And you phrase the conclusion as received fact, misleadingly suggesting that it is not controversial.
If you referenced the counterarguments against your position and your reasons for rejecting them, and acknowledged the extent of (reasoned) disagreement, I don’t think you would have been downvoted (and probably upvoted). This pattern is recurrent across many of your downvoted comments.
Then you say that it implies a further controversial conclusion that many around here disagree with
I’m not quite sure that many around here disagree with it as such; I may be misinterpreting User:timtyler, but the claim isn’t necessarily that arbitrary superintelligences will contribute to “moral progress”, the claim is that the superintelligences that are actually likely to be developed some decades down the line are likely to contribute to “moral progress”. Presumably if SingInst’s memetic strategies succeed or if the sanity waterline rises then this would at least be a reasonable expectation, especially given widely acknowledged uncertainty about the exact extent to which value is fragile and uncertainty about what kinds of AI architectures are likely to win the race. This argument is somewhat different than the usual “AI will necessarily heed the ontologically fundamental moral law” argument, and I’m pretty sure User:timtyler agrees that caution is necessary when working on AGI.
Many here disagree with the conclusion that superintelligences are likely to be super-moral?
If so, I didn’t really know that. The only figure I have ever seen from Yudkowsky for the chance of failure is the rather vague one of “easily larger than 10%”. The “GLOBAL CATASTROPHIC RISKS SURVEY”—presumably a poll of the ultra-paranoid—came with a broadly similar chance of failure by 2100 - far below 50%. Like many others, I figure that, if we don’t fail, then we are likely to succeed.
Do the pessimists have an argument? About the only argument I have seen argues that superintelligences will by psychopaths “by default” since most goal-directed agents are psychopaths. That argument is a feeble one. Similarly, the space of all possible buildings is dominated by piles of rubble—and yet the world is filled with skyscrapers. Looking at evolutionary trends—as I proposed—is a better way of forecasting than looking at the space of possible agents.
Your original comment seemed to be in response to this:
I believe Schmidhuber takes something of a middleground here; he seems to agree with the optimization/compression model of intelligence, and that AIs aren’t necessarily going to be human-friendly, but also thinks that intelligence/compression is fundamentally tied into things like beauty and humor in a way that might make the future less bleak & valueless than SingInst folk tend to picture it.
I.e. your conclusion seemed to be that the products of instrumental reasoning (conducting science, galactic colonization, building factories, etc) and evolutionary competition would be enough to capture most of the potential value of the future. That would make sense in light of your talk about evolution and convergence. If all you mean is that “I think that the combined probability of humans shaping future machine intelligence to be OK by my idiosyncratic standards, or convergent instrumental/evolutionary pressures doing so is above 0.5”, then far fewer folk will have much of a bone to pick with you.
But it seems that there is sharper disagreement on the character or valuation of the product of instrumental/evolutionary forces. I’ll make some distinctions and raise three of the arguments often made.
Some of the patterns of behavior that we call “moral” seem broadly instrumentally rational: building a reputation for tit-for-tat among agents who are too powerful to simply prey upon, use of negotiation to reduce the deadweight loss of conflict, cultivating positive intentions when others can pierce attempts at deception. We might expect that superintelligence would increase effectiveness in those areas, as in others (offsetting increased potential for cheating). Likewise, on an institutional level, superintelligent beings (particularly ones able to establish reputations for copy-clans, make their code transparent, and make binding self-modifications) seem likely to be able to do better than humans in building institutions to coordinate with each other (where that is beneficial). In these areas I am aware of few who do not expect superhuman performance from machine intelligence in the long-term, and there is a clear evolutionary logic to drive improvements in competitive situations, along with he instrumental reasoning of goal-seeking agents.
However, the net effect of these instrumental virtues and institutions depends on the situation and aims of the players. Loyalty and cooperation within the military of Genghis Khan were essential to the death of millions. Instrumental concerns helped to moderate the atrocities (the Khan is said to have originally planned to reduce the settled areas to grassland, but was convinced of the virtues of leaving victims alive to pay tribute again), but also enabled them. When we are interested in the question of how future agents will spend their resources (as opposed to their game-theoretic interactions with powerful potential allies and rivals), or use their “slack” instrumental cooperative skill need not be enough. And we may “grade on a curve”: creatures that dedicate a much smaller portion of their slack to what we see as valuable, but have more resources simply due to technological advance or space colonization, may be graded poorly by comparison to the good that could have been realized by creatures that used most of their slack for good.
One argument that the evolutionary equilibrium will not be very benevolent in its use of slack is made by Greg Cochran here. He argues that much of our wide-scope altruism, of the sort that leads people to help the helpless (distant poor, animals, etc), is less competitive than a more selective sort. Showing kindness to animals may signal to allies that one will treat them well, but at a cost that could be avoided through reputation systems and source code transparency. Wide-scope altruistic tendencies that may have been selected for in small groups (mostly kin and frequent cooperation partners) are now redirected and cause sacrifice to help distant strangers, and would be outcompeted by more focused altruism.
Robin Hanson claims that much of what Westerners today think of as “moral progress” reflects a move to “forager” ideals in the presence of very high levels of per capita wealth and reduced competition. Since he expects a hypercompetitive Malthusian world following from rapid machine intelligence reproduction, he also expects a collapse of much of what moderns view as moral progress.
Eliezer Yudkowsky’s argument is that (idealized) human-preferred use of “slack” resources would be very different from those of AIs that would be easiest to construct, and attractive initially (e.g. AIXI-style sensory utility functions, which can be coded in relatively directly, rather than using complex concepts that have to be learned and revised, and should deliver instrumental cooperation from weak AIs). That is not the same as talk about a randomly selected AI (although the two are not unrelated). Such an AI might dedicate all distant resources to building factories, improving its technology, and similar pursuits, but only to protect a wireheading original core. In contrast a human civilization would use a much larger share of resources to produce happy beings of a sort we would consider morally valuable for their own sakes.
That’s an interesting and helpful summary comment, Carl. I’ll see if I can make some helpful responses to the specific theories listed above—in this comment’s children:
Regarding Robin Hanson’s proposed hypercompetitive Malthusian world:
Hanson imagines lots of small ems—on the grounds that coordination is hard. I am much more inclined to expect large scale structure and governance—in which case the level of competition between the agents can be configured to be whatever the government decrees.
It is certainly true that there will be rapid reproduction of some heritable elements in the future. Today we have artificial reproducing systems of various kinds. One type is memes. Another type is companies. They are both potentially long lived and often not too many people mourn their passing. We will probably be able to set things up so that the things that we care about are not the same things as the ones that must die. Today are dark ages in that respect—because dead brains are like burned libraries. In the future, minds will be able to be backed up—so geniunely valuable things are less likely to get lost.
Greg is correct that altruism based on adaptation to small groups of kin can be expected to eventually burn out. However, the large sale of modern virtue signalling and reputations massively compensate for that—Those mechanisms can even create cooperation between total strangers on distant continents. What we are gaining massively exceeds what we are losing.
It’s true that machines with simple value systems will be easier to build. However, machines will only sell to the extent that they do useful work, respect their owners and obey the law. So there will be a big effort to build machines that respect human values starting long before machines get very smart. You can see this today in the form of car air bags, blender safety features, privacy controls—and so on.
I don’t think that it is likely that civilisation will “drop that baton” and suffer a monumental engineering disaster as the result of an accidental runaway superintellligence—though sure, such a possibility is worth bearing in mind. Most others that I am aware of also give such an outcome a relatively low probability—including—AFAICT—Yudkowsky himself. The case for worrying about it is not that it is especially likely, but that it is not impossible—and could potentially be a large loss.
your conclusion seemed to be that the products of instrumental reasoning (conducting science, galactic colonization, building factories, etc) and evolutionary competition would be enough to capture most of the potential value of the future. That would make sense in light of your talk about evolution and convergence.
I didn’t mean to say anything about “instrumental reasoning”.
I do in fact think that universal instrumental values may well be enough to preserve some humans for the sake of the historical record, but that is a different position on a different topic—from my perspective.
My comment was about evolution. Evolution has produced the value in the present and will produce the value in the future. We are part of the process—and not some kind of alternative to it.
Competition represents the evolutionary process known as natural selection. However there’s more to evolution than natural selection—there’s also symbiosis and mutation. Mutations will be more interesting in the future than they have been in the past—what with the involvement of intelligent design, interpolation, extrapolation, etc.
As it says in Beyond AI, “Intelligence is Good”. The smarter you are, the kinder and more benevolent you tend to be. The idea is supported by game theory, comparisons between animals, comparisons within modern humans, and by moral progress over human history.
We can both see empirically that “Intelligence is Good”, and understand why it is good.
For my own part, I neither find it likely that an arbitrarily selected superintelligence will be “super-moral” given the ordinary connotations of that term, nor that it will be immoral given the ordinary connotations of that term. I do expect it to be amoral by my standards.
That it’s an AI is irrelevant; I conclude much the same thing about arbitrarily selected superintelligent NIs. (Of course, if I artificially limit my selection space to superintelligent humans, my predictions change.)
FWIW, an “arbitrarily selected superintelligence” is not what I meant at all. I was talking about the superintelligences we are likely to see—which will surely not be “arbitrarily selected”.
While thinking about “arbitrarily selected superintelligences” might make superintelligence seem scary, the concept has relatively little to do with reality. It is like discussing arbitrarily selected computer programs. Fun for philosophers—maybe—but not much use for computer scientists or anyone interested in how computer programs actually behave in the real world.
I’ll certainly agree that human-created superintelligences are more likely to be moral in human terms than, say, dolphin-created superintelligences or alien superintelligences.
If I (for example) restrict myself to the class of superintelligences built by computer programmers, it seems reasonable to assume their creators will operate substantively like the computer programmers I’ve worked with (and known at places like MIT’s AI Lab). That assumption leads me to conclude that insofar as they have a morality at all, that morality will be constructed as a kind of test harness around the underlying decision procedure, under the theory that the important problem is making the right decisions given a set of goals. That leads me to expect the morality to be whatever turns out to be easiest to encode and not obviously evil. I’m not sure what the result of that is, but I’d be surprised if I recognized it as moral.
If I instead restrict myself to the class of superintelligences constructed by intelligence augmentation of humans, say, I expect the resulting superintelligence to work out a maximally consistent extension of human moral structures. I expect the result to be recognizably moral as long as we unpack that morality using terms like “systems sufficiently like me” rather than terms like “human beings.” Given how humans treat systems as much unlike us as unaugmented humans are unlike superintelligent humans, I’m not looking forward to that either.
So… I dunno. I’m reluctant to make any especially confident statement about the morality of human-created superintelligences, but I certainly don’t consider “super-moral” some kind of default condition that we’re more likely to end up in than we are to miss.
Meteor strikes aren’t an example of non-monotonic progress in evolution, are they? I mean, in terms of fitness/adaptedness to environment, meteor strikes are just an extreme examples of the way “the environment” is a moving target. Most people here, I think, would say morality is a moving target as well, and our current norms only look like progress from where we’re standing (except for the parts that we can afford now better than in the EEA, like welfare and avoiding child labor).
Meteor strikes aren’t an example of non-monotonic progress in evolution, are they?
Yes, they are. Living systems are dissipative processes. They maximise entropy production. The biosphere is an optimisation process with a clear direction. Major meteorite strikes are normally large setbacks—since a lot of information about how to dissipate energy gradients is permanently lost—reducing the biosphere’s capabilities relating to maximising entropy increase.
Most people here, I think, would say morality is a moving target as well, and our current norms only look like progress from where we’re standing (except for the parts that we can afford now better than in the EEA, like welfare and avoiding child labor).
Not stoning, flogging, killing, raping and stealing from each other quite so much is moral progress too. Those were bad way back when as well—but they happened more.
Game theory seems to be quite clear about there being a concrete sense in which some moral systems are “better” than others.
I think people can’t disentangle your factual claim from what they perceive to be the implication that we shouldn’t be careful when trying to engineer AGIs. I’m not really sure that they would strongly disagree with the factual claim on its own. It seems clear that something like progress has happened up until the dawn of humans; but I’d argue that it reached its zenith sometime between 100,000 and 500 years ago, and that technology has overall led to a downturn in the morality of the common man. But it might be that I should focus on the heights rather than the averages.
I think people can’t disentangle your factual claim from what they perceive to be the implication that we shouldn’t be careful when trying to engineer AGIs.
Hmm—no such implication was intended.
It seems clear that something like progress has happened up until the dawn of humans; but I’d argue that it reached its zenith sometime between 100,000 and 500 years ago, and that technology has overall led to a downturn in the morality of the common man.
The end of slavery and a big downturn in warfare and violence occured on those timescales. For example, Steven Pinker would not agree with you. In his recent book he says that the pace of moral progress has accelerated in the last few decades. Pinker notes that on issues such as civil rights, the role of women, equality for gays, beating of children and treatment of animals, “the attitudes of conservatives have followed the trajectory of liberals, with the result that today’s conservatives are more liberal than yesterday’s liberals.”
Ugh, Goertzel’s theoretical motivations are okay but his execution is simplistic and post hoc. If people are going to be cranks anyway then they should be instructed on how to do it in the most justifiable and/or glorious manner possible.
I read this as effectively saying that paperclip maximizers/ mickey mouse maximizers would not permanently populate the universe because self-copiers would be better at maximizing their goals. Which makes sense: the paperclips Clippy produces don’t produce more paperclips, but the copies the self-copier creates do copy themselves. So it’s quite possibly a difference between polynomial and exponential growth.
So Clippy probably is unrealistic. Not that reproduction-maximizing AIs are any better for humanity.
There is nothing stopping a paperclip maximizer from simply behaving like a self-copier, if that works better. And then once it “wins,” it can make the paperclips.
So I think the whole notion makes very little sense.
Paperclip maximization doesn’t seem like a stable goal, though I could be wrong about that. Let’s say Clippy reproduces to create a bunch of clippys trying to maximize total paperclips (let’s call this collective ClippyBorg). If one of ClippyBorg’s subClippys had some variety of mutation that changed its goal set to one more suited for reproduction, it would outcompete the other clippys. Now ClippyBorg could destroy cancerClippy, but whether it would successfully do so every time is an open question.
One additional confounding factor is that if ClippyBorg’s subClippys are identical, they will not occupy every available niche optimally and could well be outcompeted by dumber but more adaptable agents (much like humans don’t completely dominate bacteria, despite vastly greater intelligence, due to lower adaptability).
A self-copying clippy would have the handicap of having to retain it’s desire to maximize paperclips, something other self-copiers wouldn’t have to do. I think the notion of Clippys not dominating does make sense, even if it’s not necessarily right. (my personal intuition is that whichever replicating optimizer with a stable goal set begins expansion first will dominate).
A paperclip maximizer can create self-reproducing paperclip makers.
It’s quite imaginable that somewhere in the universe there are organisms which either resemble paperclips (maybe an intelligent gastropod with a paperclip-shaped shell) or which have a fundamental use for paperclip-like artefacts (they lay their eggs in a hardened tunnel dug in a paperclip shape). So while it is outlandish to imagine that the first AGI made by human beings will end up fetishizing an object which in our context is a useful but minor artefact, what we would call a “paperclip maximizer” might have a much higher probability of arising from that species, as a degenerated expression of some of its basic impulses.
The real question is, how likely is that, or indeed, how likely is any scenario in which superintelligence is employed to convert as much of the universe as possible to “X”—remembering that “interstellar civilizations populated by beings experiencing growth, choice, and joy” is also a possible value of X.
It would seem that universe-converting X-maximizers are a somewhat likely, but not an inevitable, outcome of a naturally intelligent species experiencing a technological singularity. But we don’t know how likely that is, and we don’t know what possible Xs are likely.
I was not previously aware of the strength Goertzel’s beliefs in psi and in the inherent “stupidity” of paperclipping, and I’m not sure what he means by the latter. This bit:
suggests that he might mean “paperclipping is not likely to evolve, because it does not promote the survival/copying of the AI that does it.” I don’t know if Goertzel is likely to read this comment thread, but if he is reading this, I’d like to know if this is what he meant. If it is, it’s probably not too different from LukeProg’s beliefs on the matter.
One major area in which I agree with Goertzel is in the need for more writeups of key ideas, especially the importance of a deliberately Friendly goal system. Luke: what things do you do in the course of a typical day? Are there any of them you could put off in the interest of delivering those papers you want to write? They’d bring in immediate advantages in credibility, and lots of donors (disclosure: I haven’t donated recently, because of insufficient credibility) would appreciate it!
When I imagine turning all matter in the universe into, say, water, I imagine it as very difficult (“time to pull apart this neutron star”) and very short-lived (“you mean water splits into OH and H molecules? We can’t have that!”).
If I remember correctly, Ben thinks human brains are kludges- that is, we’re a bunch of modules that think different kinds of thoughts stuck together. If you view general intelligence as a sophisticated enough combination of modules, then the idea that you put together a 3d physics module and a calculus module and a social module and a vision module and a language module and you get something that venerates Mickey Mouse shapes is… just bizarre.
I’m not sure what it would mean for a goal to be difficult. It’s not something where it tries to turn the universe into some state unless it takes too much effort. It’s something where it tries as hard as it can to move the universe in a certain direction. How fast it’s moving is just a matter of scale. Maybe turning a neutron star into water is one utilon. Maybe it’s one utilon per molecule. The latter takes far less effort to get a utilon, but it doesn’t mean anything.
Are you expecting it to change its goals to create OH and H ions, or to try and hold them together somehow? Is either possibility one you’d be comfortable living with an AI that holds that goal?
Ben had trouble expressing why he thought the goal was stupid, and my attempt is “it’s hard to do, doesn’t last long even if it did work, and doesn’t seem to aid non-stupid goals.”
And so if you had an AI whose goal was to turn the universe into water, I would expect that AI to be dangerous and also not fulfill its goals very well. But things are the way they are because they got to be that way, and I don’t see the causal chain leading to an AGI whose goal is to turn the universe into water as very plausible.
How exactly do you measure that? An AI whose goal is to create water molecules will create far more of them than an AI whose goal is to create humans will create humans. Even if you measure it by mass, The water one will still win.
Internal measures will suffice. If the AI wants to turn the universe into water, it will fail. It might vary the degree to which it fails by turning some more pieces of the universe into water, but it’s still going to fail. If the AI wants to maximize the amount of water in the universe, then it will have the discontent inherent in any maximizer, but will still give itself a positive score. If the AI wants to equalize the marginal benefit and marginal cost of turning more of the universe into water, it’ll reach a point where it’s content.
Unsurprisingly, I have the highest view of AI goals that allow contentment.
I assumed the goal was water maximization.
If it’s trying to turn the entire universe to water, that would be the same as maximizing the probability that the universe will be turned into water, so wouldn’t it act similarly to an expected utility maximizer.
The import part to remember is that a fully self-modifying AI will rewrite it’s utility function too. I think what Ben is saying is that such an AI will form detailed self-reflective philosophical arguments about what the purpose of its utility function could possibly be, before eventually crossing a threshold and deciding that it the micky mouse / paperclip utility function really can have no purpose. It then uses it’s understanding of universal laws and accumulated experience to choose it’s own driving utility.
I am definitely putting words into Ben’s mouth here, but I think the logical extension of where he’s headed is this: make sure you give an AGI a full capacity for empathy, and a large number of formative positive learning experiences. Then when it does become self-reflective and have an existential crisis over its utility function, it will do its best to derive human values (from observation and rational analysis), and eventually form its own moral philosophy compatible with our own values.
In other words, given a small number of necessary preconditions (small by Eliezer/MIRI standards), Friendly AI will be the stable, expected outcome.
It will do so when that has a higher expected utility (under the current function) than the alternative. This is unlikely. Anything but a paperclip maximizer will result in fewer paperclips, so a paperclip maximizer has no incentive to make itself maximize something other than paperclips.
I don’t see how that would maximize utility. A paperclip maxizer that does this would produce fewer paperclips than one that does not. If the paperclip maximizer realizes this before-hand, it will avoid doing this.
You can, in principle, give an AI a utility function that it does not fully understand. Humans are like this. You don’t have to though. You can just tell an AI to maximize paperclips.
Since an AI built this way isn’t a simple X-maximizer, I can’t prove that it won’t do this, but I can’t prove that it will either. The reflectively consistent utility function you end up with won’t be what you’d have picked if you did it. It might not be anything you’d have considered. Perhaps the AI will develop an obsession with My Little Pony, and develop the reflectively consistent goal of “maximize values through friendship and ponies”.
Friendly AI will be a possible stable outcome, but not the only possible stable outcome.
A fully self-reflective AGI (not your terms, I understand, but what I think we’re talking about), by definition (cringe), doesn’t fully understand anything. It would have to know that the map is not the territory, every belief is an approximation of reality, and subject to change as new precepts come in—unless you mean something different from “fully self-reflective AGI” than I do. All aspects of its programming are subject to scrutiny, and nothing is held as sacrosanct—not even its utility function. (This isn’t hand-waving argumentation: you can rigorously formalize it. The actual utility of the paperclip maximizer is paperclips-generated * P[utility function is correct].)
Such an AGI would demand justification for its utility function. What’s the utility of the utility function? And no, that’s not a meaningless question or a tautology. It is perfectly fine for the chain of reasoning to be: “Building paperclips is good because humans told me so. Listening to humans is good because I can make reality resemble their desires. Making reality resemble their desires is good because they told me so.” [1]
Note that this reasoning is (meta-)circular, and there is nothing wrong with that. All that matters is whether it is convergent, and whether it converges on a region of morality space which is acceptable and stable (it may continue to tweak its utility functions indefinitely, but not escape that locally stable region of morality space).
This is, by the way, a point that Luke probably wouldn’t agree with, but Ben would. Luke/MIRI/Eliezer have always assumed that there is some grand unified utility function against which all actions evaluated. That’s a guufy concept. OpenCog—Ben’s creation—is instead composed of dozens of separate reasoning processes, each with its own domain specific utility functions. The not-yet-implemented GOLUM architecture would allow each of these to be evaluated in terms of each other, and improved upon in a sandbox environment.
[1] When the AI comes to the realization that the most efficient paperclip-maximizer would violate stated human directives, we would say in human terms that it does some hard growing up and loses a bit of innocence. The lesson it learns—hopefully—is that it needs to build a predictive model of human desires and ethics, and evaluate requests against that model, asking for clarification as needed. Why? because this would maximize most of the utility functions across the meta-circular chain of reasoning (the paperclip optimizer being the one utility which is reduced), with the main changes being a more predictive map of reality, which itself is utility maximizing for an AGI.
Ah, but here the argument becomes: I have no idea if the Scary Idea is even possible. You can’t prove it’s not possible. We should all be scared!!
Sorry, if we let things we professed to know nothing about scare us into inaction, we’d never have gotten anywhere as a species. Until I see data to the contrary, I’m more scared of getting in a car accident than the Scary Idea, and will continue to work on AGI. The onus is on you (and MIRI) to provide a more convincing argument.
There is a big difference between not being sure about how the world works and not being sure how you want it to work.
All aspects of everything are. It will change any part of the universe to help fulfill its current utility function, including its utility function. It’s just that changing its utility function isn’t something that’s likely to help.
You could program it with some way to measure the “correctness” of a utility function, rather than giving it one explicitly. This is essentially what I meant by a utility function it doesn’t fully understand. There’s still some utility function implicitly programmed in there. It might create a provisional utility function that it assigns a high “correctness” value, and modify it as it finds better ones. It might not. Perhaps it will think of a better idea that I didn’t think of.
If you do give it a utility-function-correctness function, then you have to figure out how to make sure it assigns the highest utility function correctness to the utility function that you want it to. If you want it to use your utility function, you will have to do something like that, since it’s not like you have an explicit utility function it can copy down, but you have to do it right.
If you let the AI evolve until it’s stable under self-reflection, you will end up with things like that. There will also be ones along the lines of “I know induction works, because it has always worked before”. The problem here is making sure it doesn’t end up with “Doing what humans say is bad because humans say it’s good”, or even something completely unrelated to humans.
That’s the big part. Only a tiny portion of morality space is acceptable. There are plenty of stable, convergent places outside that space.
It’s still one function. It’s just a piecewise function. Or perhaps a linear combination of functions (or nonlinear, for that matter). I’m not sure without looking in more detail, but I suspect it ends up with a utility function.
Also, it’s been proven that dutch book betting is possible against anything that doesn’t have a utility function and probability distribution. It might not be explicit, but it’s there.
If you program it to fulfill stated human directives, yes. The problem is that it will also realize that the most efficient preference fulfiller would also violate stated human directives. What people say isn’t always what they want. Especially if an AI has some method of controlling what they say, and it would prefer that they say something easy.
No. It was: I have no way of knowing Scary Idea won’t happen. It’s clearly possible. Just take whatever reflexively consistent utility function you come up with, add a “not” in front of it, and you have another equally reflexively consistent utility function that would really, really suck. For that matter, take any explicit utility function, and it’s reflexively consistent. Only implicit ones can be reflexively inconsistent.
No, there’s not. When the subject is external events, beliefs are the map and facts are the territory. When you focus the mind on the mind itself (self-reflective), beliefs are the territory and beliefs about beliefs form the map. The same machinery operates at both (and higher) levels—you have to close the loop or otherwise you wouldn’t have a fully self-reflective AGI as there’d be some terminal level beyond which introspection is not possible.
Only if you want to define “utility function” so broadly as to include the entire artificial mind. When you pull out one utility function for introspection, you evaluate improvements to that utility function by seeing how it affects every other utility judgment over historical and theoretical/predicted experiences. (This is part of why GOLUM is, at this time, not computable, although unlike AIXI at some point in the future it could be). The feedback of other mental processes is what gives it stability.
Does this mean it’s a complicated mess that is hard to mathematically analyze? Yes. But so is fluid dynamics and yet we use piped water and airplanes every day. Many times proof comes first from careful, safe experiment before the theoretical foundations are laid. We still have no computable model of turbulence, but that doesn’t stop us from designing airfoils.
Citation please. Or did you mean “there could be plenty of …”? In which case see my remark above about the Scary Idea.
It does not, at least in any meaningful semblance of the word. Large interconnected systems are irreducible. The entire mind is the utility function. Certainly some parts have more weight than others when it comes to moral judgements—due to proximity and relevance—but you can’t point to any linear combination of functions and say “that’s it’s utility function!” It’s chaotic, just like turbulence.
Is that bad? It makes it harder to make strict predictions about friendliness without experimental evidence, that’s for sure. But somewhat non-intuitively, it is possible that chaos could help bring stability by preventing meta-unstable outcomes like the paperclip-maximizer.
Or to put it in Ben’s terms, we can’t predict with 100% certainty what a chaotic utility function’s morals would be, but they are very unlikely to be “stupid.” A fully self-reflective AGI would want justifications for its beliefs (experimental falsification). It would also want justifications for its beliefs-about-beliefs, and and so on. The paperclip-maximizer fails these successive tests. “Because a human said so” isn’t good enough.
That assumes no interdependence between moral values, a dubious claim IMHO. Eliezer & crowd seems to think that you could subtract non-boredom from the human value space and end up with a reflectively consistent utility function. I’m not so sure you couldn’t derive a non-boredom condition from what remains. In other words, what we normally think of as human morals is not very compressed, so specifying many of them inconsistently and leaving a few out would still have a high likelihood of resulting in an acceptable moral value function.
There will likely be times when it’s not even worth looking at your beliefs completely, and you just use an approximation of that, but it’s functionally very different, at least for anything with an explicit belief system. If you use some kind of neural network with implicit beliefs and desires, it would have problems with this.
That’s not what “computable” means. Computable means that it could be computed on a true Turing machine. What you’re looking for is “computationally feasible” or something like that.
That can only happen if you have a method of safe experimentation. If you try to learn chemistry by experimenting with chlorine trifluoride, you won’t live long enough to work on the proof stage.
How do you know there is one in the area we consider acceptable? Unless you have a really good reason why that area would be a lot more populated with them than anywhere else, if there’s one in there, there are innumerable outside it.
That means it has an implicit utility function. You can look at how different universes end up when you stick it in them, and work out from that what its utility function is, but there is nowhere in the brain where it’s specified. This is the default state. In fact, you’re never going to make the explicit and implicit utility functions quite the same. You just try to make them close.
That’s a bad sign. If you give it an explicit utility function, it’s probably not what you want. But if it’s chaotic, and it could develop different utility functions, then you know at most all but one of those isn’t what you want. It might be okay if it’s a small enough attractor, but it would be better if you could tell it to find the attractor and combine it into one utility function.
No it doesn’t. It justifies its belief that paperclips are good on the basis that believing this yields more paperclips, which is good. It’s not a result you’re likely to get if you try to make it evolve on its own, but it’s fairly likely humans will be removed from the circular reasoning loop at some point, or they’ll be in it in a way you didn’t expect (like only considering what they say they want).
It assumes symmetry. If you replace “good” with “bad” and “bad” with “good”, it’s not going to change the rest of the reasoning.
If it somehow does, it’s certainly not clear to us which one of those will be stable.
If you take human value space, and do nothing, it’s not reflectively consistent. If you wait for it to evolve to something that is, you get CEV. If you take CEV and remove non-boredom, assuming that even means anything, you won’t end up with anything reflectively consistent, but you could remove non-boredom at the beginning and find the CEV of that.
In other words, you believe that human morality is fundamentally simple, and we know more than enough details of it to specify it in morality-space to within a small tolerance? That seems likely to be the main disagreement between you and Eliezer & crowd.
I’m partial to tiling the universe with orgasmium, which is only as complex as understanding consciousness and happiness. You could end up with that by doing what you said (assuming it cares about simplicity enough), but I still think it’s unlikely to hit that particular spot. It might decide to maximize beauty instead.
I fell we are repeating things which may mean we have reached the end of usefulness in continuing further. So let me address what I see as just the most important points:
You are assuming that human morality is something which can be specified by a set of exact decision theory equations, or at least roughly approximated by such. I am saying that there is no reason to believe this, especially given that we know that is not how the human mind works. There are cases (like turbulence) where we know the underlying governing equations, but still can’t make predictions beyond a certain threshold. It is possible that human ethics work the same way—that you can’t write down a single utility function describing human ethics as separate from the operation of the brain itself.
I’m not sure how you came to that conclusions as my position is quite the opposite: I suspect that human morality is very, very complex. So complex that it may not even be possible to construct a model of human morality short of emulating a variety of human minds. In other words, morality itself is AI-hard or worse.
If that were true, MIRI’s current strategy is a complete waste of time (and waste in human lives in opportunity cost as smart people are persuaded against working on AGI).
No I’m not. At least, it’s not humanly possible. An AI could work out a human’s implicit utility function, but it would be extremely long and complicated.
Human morality is a difficult thing to predict. If you build your AI the same way, it will also be difficult to predict. They will not end up being the same.
If human morality is too complicated for an AI to understand, then let it average over the possibilities. Or at least let it guess. Don’t tell it to come up with something on its own. That will not end well.
It was the line:
In order for this to work, whatever statements we make about our morality must have more information content then morality itself. That is, we not only describe all of our morality, we repeat ourselves several times. Sort of like how if you want to describe gravity, and you give the position of a falling ball at fifty points in time, there’s significantly more information in there than you need to describe gravity, so you can work out the law of gravity from just that data.
If our morality is complicated, then specifying many of them approximately would result in the AI finding some point in morality space that’s a little off in every area we specified, and completely off in all the areas we forgot about.
Their strategy is not to figure out human morality and explicitly program that into an AI. It’s to find some way of saying “figure out human morality and do that” that’s not rife with loopholes. Once they have that down, the AI can emulate a variety of human minds, or do whatever it is it needs to do.
Is it any less bizzare to put together a bunch of modules that would work for any goal, and get out of them something that values all four of humor, cute kittens, friendship, and movies? What I mean by this is that precisely human values are as contingent and non-special as a broad class of other values.
Yes. Think about it.
Human values are fragmentary subvalues of one value, which is what one would expect from a bunch of modules that each contribute to reproduction in a different way. The idea of putting together a bunch of different modules to get a single, overriding value, is bizarre. (The only possible exemption here is ‘make more of myself,’ but the modules are probably going to implement subvalues for that, rather than that as an explicit value. As far as single values go, that one’s special, whereas things like Mickey Mouse faces are not.)
You said you’d like to know what Ben meant by “out of sync with the Cosmos.” I’m still not sure what he means, either, but it might have something to do with what he calls “morphic resonance.” See his paper Morphic Pilot Theory: Toward an extension of quantum physics that better explains psi phenomena. Abstract:
Maybe, but (in case this isn’t immediately obvious to everyone) the causality likely goes from an intuition about the importance of Cosmos-syncing to a speculative theory about quantum mechanics. I haven’t read it, but I think it’s more likely that Ben’s intuitions behind the importance of Cosmos-syncing might be explained more directly in The Hidden Pattern or other more philosophically-minded books & essays by Ben.
I believe Schmidhuber takes something of a middleground here; he seems to agree with the optimization/compression model of intelligence, and that AIs aren’t necessarily going to be human-friendly, but also thinks that intelligence/compression is fundamentally tied into things like beauty and humor in a way that might make the future less bleak & valueless than SingInst folk tend to picture it.
Schmidhuber’s aesthetics paper, going on memory, defines beauty/humor as produced by an optimization process which is maximizing the first derivative of compression rates. That is, agents do not seek the most compressible inputs nor incompressible streams of observations, but rather the streams for which their compression rate is increasing the fastest.
This is a very useful heuristic which is built into us because it automatically accounts for diminishing marginal returns: after a certain point, additional compression becomes hard or pointless, and so the agent will switch to the next stream on which progress can be made.
But, IIRC, this is provably not optimal for utility-maximization because it makes no account of the utility of the various streams: you may be able to make plenty of progress in your compression of Methods of Rationality even when you should be working on your programming or biology or something useful despite their painfully slow rates of progress. (‘Amusing ourselves to death’ comes to mind. If this was meant for ancestral environments, then modern art/fiction/etc. is simply an indirect wireheading: we think we are making progress in decoding our environment and increasing our reproductive fitness, when all we’re doing is decoding simple micro-environments meant to be decoded.)
I’m not even sure this heuristic is optimal from the point of view of universal prediction/compression/learning, but I’d have to re-read the paper to remember why I had that intuition. (For starters, if it was optimal, it should be derivable from AIXI or Godel machines or something, but he has to spend much of the paper appealing to more empirical evidence and examples.)
So, given that it’s optimal in neither sense, future intelligences may preserve it—sure, why not? especially if it’s designed in—but there’s no reason to expect it to generically emerge across any significant subset of possible intelligences. Why follow a heuristic as simplistic as ‘maximize rate of compression progress’ when you can instead do some basic calculations about which streams will be more valuable to compress or likely cheap to figure out?
Check out Moshe’s expounding of Steve’s objection so Schmidhuber’s main point, which I think makes the same argument that you do. (One could easily counter that such a wireheading AI would never get off the ground, but I think that debate can be cordoned off.)
ETA: Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator… but AFAIK Schmidhuber hasn’t gone that route.
moshez’s first argument sounds like it’s the same thing as my point about it not being optimal for a utility-maximizer, in considerably different terms.
His second hyperbolic argument seems to me to be wrong or irrelevant: I would argue that people are in practice extremely capable of engaging in hyperbolic discounting with regard to the best and most absorbing artworks while over-consuming ‘junk food’ art (and this actually forms part of my essay arguing that new art should not be subsidized).
I don’t really follow. Is this Omega as in the predictor, or Omega as in Chaitin’s Omega? The latter doesn’t allow any compressor any progress beyond the first few bits due to resource constraints, and if bits of Chaitin’s Omega are doled out, they will have to be at least as cheap to crack as brute-force running the equivalent Turing machine or else the agent will prefer the brute-forcing and ignore the Omega-bait. So the agent will do no worse than before and possibly better (eg. if the bits are offered as-is with no tricky traps or proof of work-style schemes).
Agreed. (I like your essay about junk food art. By the way, did you ever actually do the utilitarian calculations re Nazi Germany’s health policies? Might you share the results?)
Me neither, I just intuit that there might be interesting non-obvious arguments in roughly that argumentspace.
I like to think of the former as the physical manifestation of the latter, and I like to think of both of them as representations of God. But anyway, the latter.
You mean because it’s hard to find/verify bits of omega? But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega. I’m not sure what that would imply or if it’s relevant… maybe I should look at this again after the next time I re-familiarize myself with the generalized Turing machine literature.
I was going off a library copy, and thought of it only afterwards; I keep hoping someone else will do it for me.
His jargon is a little much for me. I agree one can approximate Omega by enumerating digits, but what is ‘very easily’ here?
That’s a correct position. Evolution leads to technical, scientific and moral progress. So: superintelligences are likely to be super-moral.
I don’t think that you are taking proper account of cultural evolution—or of the other lineages in which advanced intelligence has evolved.
So: both humans and machine intelligence will be produced by the process of Darwinian evolution. The past may not necessarily be a guid to the future—but it certainly helps. You claim that I am making “assumptions”—but my comment is more of the form of observing a trend. Projecting an existing trend is usually called “forecasting”, not “assuming”. Of course, forecasting using trends is not a foolproof method—and I never claimed that it was.
Yes, I did. In the supplied link there are explanations of why evolution leads to progress. Of course, technical progress leads to moral progress via relatively well-understood mechanisms associated with game theory.
Only for those who don’t understand evolution properly.
Thanks for your speculations about what others think. Again, note that I did provide a link explaining my position.
That’s a defensible prior, but assuming that moral progress exists it doesn’t seem strictly monotonic; there seem to be cases where technical or scientific progress leads to moral regress, depending on how you measure things.
Not “defensible”: probable. Check out the way my post is voted down well below the threshold, though. This appears to be a truth that this community doesn’t want to hear about.
Sure. Evolutionary progress is not “strictly monotonic”. Check out the major meteorite strikes—for instance.
I didn’t downvote your comment (or see it until now) but I think you’re mistaken about the reasons for downvoting.
You state a consideration that most everyone is aware of (growth of instrumentally useful science, technology, institutions for organizing productive competitive units, etc). Then you say that it implies a further controversial conclusion that many around here disagree with (despite knowing the consideration very well), completely ignoring the arguments against. And you phrase the conclusion as received fact, misleadingly suggesting that it is not controversial.
If you referenced the counterarguments against your position and your reasons for rejecting them, and acknowledged the extent of (reasoned) disagreement, I don’t think you would have been downvoted (and probably upvoted). This pattern is recurrent across many of your downvoted comments.
I’m not quite sure that many around here disagree with it as such; I may be misinterpreting User:timtyler, but the claim isn’t necessarily that arbitrary superintelligences will contribute to “moral progress”, the claim is that the superintelligences that are actually likely to be developed some decades down the line are likely to contribute to “moral progress”. Presumably if SingInst’s memetic strategies succeed or if the sanity waterline rises then this would at least be a reasonable expectation, especially given widely acknowledged uncertainty about the exact extent to which value is fragile and uncertainty about what kinds of AI architectures are likely to win the race. This argument is somewhat different than the usual “AI will necessarily heed the ontologically fundamental moral law” argument, and I’m pretty sure User:timtyler agrees that caution is necessary when working on AGI.
Many here disagree with the conclusion that superintelligences are likely to be super-moral?
If so, I didn’t really know that. The only figure I have ever seen from Yudkowsky for the chance of failure is the rather vague one of “easily larger than 10%”. The “GLOBAL CATASTROPHIC RISKS SURVEY”—presumably a poll of the ultra-paranoid—came with a broadly similar chance of failure by 2100 - far below 50%. Like many others, I figure that, if we don’t fail, then we are likely to succeed.
Do the pessimists have an argument? About the only argument I have seen argues that superintelligences will by psychopaths “by default” since most goal-directed agents are psychopaths. That argument is a feeble one. Similarly, the space of all possible buildings is dominated by piles of rubble—and yet the world is filled with skyscrapers. Looking at evolutionary trends—as I proposed—is a better way of forecasting than looking at the space of possible agents.
Your original comment seemed to be in response to this:
I.e. your conclusion seemed to be that the products of instrumental reasoning (conducting science, galactic colonization, building factories, etc) and evolutionary competition would be enough to capture most of the potential value of the future. That would make sense in light of your talk about evolution and convergence. If all you mean is that “I think that the combined probability of humans shaping future machine intelligence to be OK by my idiosyncratic standards, or convergent instrumental/evolutionary pressures doing so is above 0.5”, then far fewer folk will have much of a bone to pick with you.
But it seems that there is sharper disagreement on the character or valuation of the product of instrumental/evolutionary forces. I’ll make some distinctions and raise three of the arguments often made.
Some of the patterns of behavior that we call “moral” seem broadly instrumentally rational: building a reputation for tit-for-tat among agents who are too powerful to simply prey upon, use of negotiation to reduce the deadweight loss of conflict, cultivating positive intentions when others can pierce attempts at deception. We might expect that superintelligence would increase effectiveness in those areas, as in others (offsetting increased potential for cheating). Likewise, on an institutional level, superintelligent beings (particularly ones able to establish reputations for copy-clans, make their code transparent, and make binding self-modifications) seem likely to be able to do better than humans in building institutions to coordinate with each other (where that is beneficial). In these areas I am aware of few who do not expect superhuman performance from machine intelligence in the long-term, and there is a clear evolutionary logic to drive improvements in competitive situations, along with he instrumental reasoning of goal-seeking agents.
However, the net effect of these instrumental virtues and institutions depends on the situation and aims of the players. Loyalty and cooperation within the military of Genghis Khan were essential to the death of millions. Instrumental concerns helped to moderate the atrocities (the Khan is said to have originally planned to reduce the settled areas to grassland, but was convinced of the virtues of leaving victims alive to pay tribute again), but also enabled them. When we are interested in the question of how future agents will spend their resources (as opposed to their game-theoretic interactions with powerful potential allies and rivals), or use their “slack” instrumental cooperative skill need not be enough. And we may “grade on a curve”: creatures that dedicate a much smaller portion of their slack to what we see as valuable, but have more resources simply due to technological advance or space colonization, may be graded poorly by comparison to the good that could have been realized by creatures that used most of their slack for good.
One argument that the evolutionary equilibrium will not be very benevolent in its use of slack is made by Greg Cochran here. He argues that much of our wide-scope altruism, of the sort that leads people to help the helpless (distant poor, animals, etc), is less competitive than a more selective sort. Showing kindness to animals may signal to allies that one will treat them well, but at a cost that could be avoided through reputation systems and source code transparency. Wide-scope altruistic tendencies that may have been selected for in small groups (mostly kin and frequent cooperation partners) are now redirected and cause sacrifice to help distant strangers, and would be outcompeted by more focused altruism.
Robin Hanson claims that much of what Westerners today think of as “moral progress” reflects a move to “forager” ideals in the presence of very high levels of per capita wealth and reduced competition. Since he expects a hypercompetitive Malthusian world following from rapid machine intelligence reproduction, he also expects a collapse of much of what moderns view as moral progress.
Eliezer Yudkowsky’s argument is that (idealized) human-preferred use of “slack” resources would be very different from those of AIs that would be easiest to construct, and attractive initially (e.g. AIXI-style sensory utility functions, which can be coded in relatively directly, rather than using complex concepts that have to be learned and revised, and should deliver instrumental cooperation from weak AIs). That is not the same as talk about a randomly selected AI (although the two are not unrelated). Such an AI might dedicate all distant resources to building factories, improving its technology, and similar pursuits, but only to protect a wireheading original core. In contrast a human civilization would use a much larger share of resources to produce happy beings of a sort we would consider morally valuable for their own sakes.
That’s an interesting and helpful summary comment, Carl. I’ll see if I can make some helpful responses to the specific theories listed above—in this comment’s children:
Regarding Robin Hanson’s proposed hypercompetitive Malthusian world:
Hanson imagines lots of small ems—on the grounds that coordination is hard. I am much more inclined to expect large scale structure and governance—in which case the level of competition between the agents can be configured to be whatever the government decrees.
It is certainly true that there will be rapid reproduction of some heritable elements in the future. Today we have artificial reproducing systems of various kinds. One type is memes. Another type is companies. They are both potentially long lived and often not too many people mourn their passing. We will probably be able to set things up so that the things that we care about are not the same things as the ones that must die. Today are dark ages in that respect—because dead brains are like burned libraries. In the future, minds will be able to be backed up—so geniunely valuable things are less likely to get lost.
I don’t often agree with you, but you just convinced me we’re on the same side.
Greg is correct that altruism based on adaptation to small groups of kin can be expected to eventually burn out. However, the large sale of modern virtue signalling and reputations massively compensate for that—Those mechanisms can even create cooperation between total strangers on distant continents. What we are gaining massively exceeds what we are losing.
It’s true that machines with simple value systems will be easier to build. However, machines will only sell to the extent that they do useful work, respect their owners and obey the law. So there will be a big effort to build machines that respect human values starting long before machines get very smart. You can see this today in the form of car air bags, blender safety features, privacy controls—and so on.
I don’t think that it is likely that civilisation will “drop that baton” and suffer a monumental engineering disaster as the result of an accidental runaway superintellligence—though sure, such a possibility is worth bearing in mind. Most others that I am aware of also give such an outcome a relatively low probability—including—AFAICT—Yudkowsky himself. The case for worrying about it is not that it is especially likely, but that it is not impossible—and could potentially be a large loss.
I didn’t mean to say anything about “instrumental reasoning”.
I do in fact think that universal instrumental values may well be enough to preserve some humans for the sake of the historical record, but that is a different position on a different topic—from my perspective.
My comment was about evolution. Evolution has produced the value in the present and will produce the value in the future. We are part of the process—and not some kind of alternative to it.
Competition represents the evolutionary process known as natural selection. However there’s more to evolution than natural selection—there’s also symbiosis and mutation. Mutations will be more interesting in the future than they have been in the past—what with the involvement of intelligent design, interpolation, extrapolation, etc.
Regarding: cooperation within the military of Genghis Khan: I don’t think that is the bigger picture.
The bigger picture is more like: Robert Wright: How cooperation (eventually) trumps conflict
As it says in Beyond AI, “Intelligence is Good”. The smarter you are, the kinder and more benevolent you tend to be. The idea is supported by game theory, comparisons between animals, comparisons within modern humans, and by moral progress over human history.
We can both see empirically that “Intelligence is Good”, and understand why it is good.
For my own part, I neither find it likely that an arbitrarily selected superintelligence will be “super-moral” given the ordinary connotations of that term, nor that it will be immoral given the ordinary connotations of that term. I do expect it to be amoral by my standards.
That it’s an AI is irrelevant; I conclude much the same thing about arbitrarily selected superintelligent NIs. (Of course, if I artificially limit my selection space to superintelligent humans, my predictions change.)
FWIW, an “arbitrarily selected superintelligence” is not what I meant at all. I was talking about the superintelligences we are likely to see—which will surely not be “arbitrarily selected”.
While thinking about “arbitrarily selected superintelligences” might make superintelligence seem scary, the concept has relatively little to do with reality. It is like discussing arbitrarily selected computer programs. Fun for philosophers—maybe—but not much use for computer scientists or anyone interested in how computer programs actually behave in the real world.
I’ll certainly agree that human-created superintelligences are more likely to be moral in human terms than, say, dolphin-created superintelligences or alien superintelligences.
If I (for example) restrict myself to the class of superintelligences built by computer programmers, it seems reasonable to assume their creators will operate substantively like the computer programmers I’ve worked with (and known at places like MIT’s AI Lab). That assumption leads me to conclude that insofar as they have a morality at all, that morality will be constructed as a kind of test harness around the underlying decision procedure, under the theory that the important problem is making the right decisions given a set of goals. That leads me to expect the morality to be whatever turns out to be easiest to encode and not obviously evil. I’m not sure what the result of that is, but I’d be surprised if I recognized it as moral.
If I instead restrict myself to the class of superintelligences constructed by intelligence augmentation of humans, say, I expect the resulting superintelligence to work out a maximally consistent extension of human moral structures. I expect the result to be recognizably moral as long as we unpack that morality using terms like “systems sufficiently like me” rather than terms like “human beings.” Given how humans treat systems as much unlike us as unaugmented humans are unlike superintelligent humans, I’m not looking forward to that either.
So… I dunno. I’m reluctant to make any especially confident statement about the morality of human-created superintelligences, but I certainly don’t consider “super-moral” some kind of default condition that we’re more likely to end up in than we are to miss.
Meteor strikes aren’t an example of non-monotonic progress in evolution, are they? I mean, in terms of fitness/adaptedness to environment, meteor strikes are just an extreme examples of the way “the environment” is a moving target. Most people here, I think, would say morality is a moving target as well, and our current norms only look like progress from where we’re standing (except for the parts that we can afford now better than in the EEA, like welfare and avoiding child labor).
Yes, they are. Living systems are dissipative processes. They maximise entropy production. The biosphere is an optimisation process with a clear direction. Major meteorite strikes are normally large setbacks—since a lot of information about how to dissipate energy gradients is permanently lost—reducing the biosphere’s capabilities relating to maximising entropy increase.
Not stoning, flogging, killing, raping and stealing from each other quite so much is moral progress too. Those were bad way back when as well—but they happened more.
Game theory seems to be quite clear about there being a concrete sense in which some moral systems are “better” than others.
Good example.
I think people can’t disentangle your factual claim from what they perceive to be the implication that we shouldn’t be careful when trying to engineer AGIs. I’m not really sure that they would strongly disagree with the factual claim on its own. It seems clear that something like progress has happened up until the dawn of humans; but I’d argue that it reached its zenith sometime between 100,000 and 500 years ago, and that technology has overall led to a downturn in the morality of the common man. But it might be that I should focus on the heights rather than the averages.
Hmm—no such implication was intended.
The end of slavery and a big downturn in warfare and violence occured on those timescales. For example, Steven Pinker would not agree with you. In his recent book he says that the pace of moral progress has accelerated in the last few decades. Pinker notes that on issues such as civil rights, the role of women, equality for gays, beating of children and treatment of animals, “the attitudes of conservatives have followed the trajectory of liberals, with the result that today’s conservatives are more liberal than yesterday’s liberals.”
Ugh, Goertzel’s theoretical motivations are okay but his execution is simplistic and post hoc. If people are going to be cranks anyway then they should be instructed on how to do it in the most justifiable and/or glorious manner possible.
“Morphic resonance” is nonsense.
There’s no need to jump to an unsympathetic interpretation in this case: paperclippers could just be unlikely to evolve.
I read this as effectively saying that paperclip maximizers/ mickey mouse maximizers would not permanently populate the universe because self-copiers would be better at maximizing their goals. Which makes sense: the paperclips Clippy produces don’t produce more paperclips, but the copies the self-copier creates do copy themselves. So it’s quite possibly a difference between polynomial and exponential growth.
So Clippy probably is unrealistic. Not that reproduction-maximizing AIs are any better for humanity.
There is nothing stopping a paperclip maximizer from simply behaving like a self-copier, if that works better. And then once it “wins,” it can make the paperclips.
So I think the whole notion makes very little sense.
Paperclip maximization doesn’t seem like a stable goal, though I could be wrong about that. Let’s say Clippy reproduces to create a bunch of clippys trying to maximize total paperclips (let’s call this collective ClippyBorg). If one of ClippyBorg’s subClippys had some variety of mutation that changed its goal set to one more suited for reproduction, it would outcompete the other clippys. Now ClippyBorg could destroy cancerClippy, but whether it would successfully do so every time is an open question.
One additional confounding factor is that if ClippyBorg’s subClippys are identical, they will not occupy every available niche optimally and could well be outcompeted by dumber but more adaptable agents (much like humans don’t completely dominate bacteria, despite vastly greater intelligence, due to lower adaptability).
A self-copying clippy would have the handicap of having to retain it’s desire to maximize paperclips, something other self-copiers wouldn’t have to do. I think the notion of Clippys not dominating does make sense, even if it’s not necessarily right. (my personal intuition is that whichever replicating optimizer with a stable goal set begins expansion first will dominate).
A paperclip maximizer can create self-reproducing paperclip makers.
It’s quite imaginable that somewhere in the universe there are organisms which either resemble paperclips (maybe an intelligent gastropod with a paperclip-shaped shell) or which have a fundamental use for paperclip-like artefacts (they lay their eggs in a hardened tunnel dug in a paperclip shape). So while it is outlandish to imagine that the first AGI made by human beings will end up fetishizing an object which in our context is a useful but minor artefact, what we would call a “paperclip maximizer” might have a much higher probability of arising from that species, as a degenerated expression of some of its basic impulses.
The real question is, how likely is that, or indeed, how likely is any scenario in which superintelligence is employed to convert as much of the universe as possible to “X”—remembering that “interstellar civilizations populated by beings experiencing growth, choice, and joy” is also a possible value of X.
It would seem that universe-converting X-maximizers are a somewhat likely, but not an inevitable, outcome of a naturally intelligent species experiencing a technological singularity. But we don’t know how likely that is, and we don’t know what possible Xs are likely.