Muehlhauser-Goertzel Dialogue, Part 1
Part of the Muehlhauser interview series on AGI.
Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.
Ben Goertzel is the Chairman at the AGI company Novamente, and founder of the AGI conference series.
Luke Muehlhauser:
[Jan. 13th, 2012]
Ben, I’m glad you agreed to discuss artificial general intelligence (AGI) with me. There is much on which we agree, and much on which we disagree, so I think our dialogue will be informative to many readers, and to us!
Let us begin where we agree. We seem to agree that:
Involuntary death is bad, and can be avoided with the right technology.
Humans can be enhanced by merging with technology.
Humans are on a risky course in general, because powerful technologies can destroy us, humans are often stupid, and we are unlikely to voluntarily halt technological progress.
AGI is likely this century.
AGI will, after a slow or hard takeoff, completely transform the world. It is a potential existential risk, but if done wisely, could be the best thing that ever happens to us.
Careful effort will be required to ensure that AGI results in good things for humanity.
Next: Where do we disagree?
Two people might agree about the laws of thought most likely to give us an accurate model of the world, but disagree about which conclusions those laws of thought point us toward. For example, two scientists may use the same scientific method but offer two different models that seem to explain the data.
Or, two people might disagree about the laws of thought most likely to give us accurate models of the world. If that’s the case, it will be no surprise that we disagree about which conclusions to draw from the data. We are not shocked when scientists and theologians end up with different models of the world.
Unfortunately, I suspect you and I disagree at the more fundamental level — about which methods of reasoning to use when seeking an accurate model of the world.
I sometimes use the term “Technical Rationality” to name my methods of reasoning. Technical Rationality is drawn from two sources: (1) the laws of logic, probability theory, and decision theory, and (2) the cognitive science of how our haphazardly evolved brains fail to reason in accordance with the laws of logic, probability theory, and decision theory.
Ben, at one time you tweeted a William S. Burroughs quote: “Rational thought is a failed experiment and should be phased out.” I don’t know whether Burroughs meant by “rational thought” the specific thing I mean by “rational thought,” or what exactly you meant to express with your tweet, but I suspect we have different views of how to reason successfully about the world.
I think I would understand your way of thinking about AGI better if I understand your way of thinking about everything. For example: do you have reason to reject the laws of logic, probability theory, and decision theory? Do you think we disagree about the basic findings of the cognitive science of humans? What are your positive recommendations for reasoning about the world?
Ben Goertzel:
[Jan 13th, 2012]
Firstly, I don’t agree with that Burroughs quote that “Rational thought is a failed experiment”—I mostly just tweeted it because I thought it was funny! I’m not sure Burroughs agreed with his own quote either. He also liked to say that linguistic communication was a failed experiment, introduced by women to help them oppress men into social conformity. Yet he was a writer and loved language. He enjoyed being a provocateur.
However, I do think that some people overestimate the power and scope of rational thought. That is the truth at the core of Burroughs’ entertaining hyperbolic statement....
I should clarify that I’m a huge fan of logic, reason and science. Compared to the average human being, I’m practically obsessed with these things! I don’t care for superstition, nor for unthinking acceptance of what one is told; and I spent a lot of time staring at data of various sorts, trying to understand the underlying reality in a rational and scientific way. So I don’t want to be pigeonholed as some sort of anti-rationalist!
However, I do have serious doubts both about the power and scope of rational thought in general—and much more profoundly, about the power and scope of what you call “technical rationality.”
First of all, about the limitations of rational thought broadly conceived—what one might call “semi-formal rationality”, as opposed to “technical rationality.” Obviously this sort of rationality has brought us amazing things, like science and mathematics and technology. Hopefully it will allow us to defeat involuntary death and increase our IQs by orders of magnitude and discover new universes, and all sorts of great stuff. However, it does seem to have its limits.
It doesn’t deal well with consciousness—studying consciousness using traditional scientific and rational tools has just led to a mess of confusion. It doesn’t deal well with ethics either, as the current big mess regarding bioethics indicates.
And this is more speculative, but I tend to think it doesn’t deal that well with the spectrum of “anomalous phenomena”—precognition, extrasensory perception, remote viewing, and so forth. I strongly suspect these phenomena exist, and that they can be understood to a significant extent via science—but also that science as presently constituted may not be able to grasp them fully, due to issues like the mindset of the experimenter helping mold the results of the experiment.
There’s the minor issue of Hume’s problem of induction, as well. I.e., the issue that, in the rational and scientific world-view, that we have no rational reason to believe that any patterns observed in the past will continue into the future. This is an ASSUMPTION, plain and simple—an act of faith. Occam’s Razor (which is one way of justifying and/or further specifying the belief that patterns observed in the past will continue into the future) is also an assumption and an act of faith. Science and reason rely on such acts of faith, yet provide no way to justify them. A big gap.
Furthermore—and more to the point about AI—I think there’s a limitation to the way we now model intelligence, which ties in with the limitations of the current scientific and rational approach. I have always advocated a view of intelligence as “achieving complex goals in complex environments”, and many others have formulated and advocated similar views. The basic idea here is that, for a system to be intelligent it doesn’t matter WHAT its goal is, so long as its goal is complex and it manages to achieve it. So the goal might be, say, reshaping every molecule in the universe into an image of Mickey Mouse. This way of thinking about intelligence, in which the goal is strictly separated from the methods for achieving it, is very useful and I’m using it to guide my own practical AGI work.
On the other hand, there’s also a sense in which reshaping every molecule in the universe into an image of Mickey Mouse is a STUPID goal. It’s somehow out of harmony with the Cosmos—at least that’s my intuitive feeling. I’d like to interpret intelligence in some way that accounts for the intuitively apparent differential stupidity of different goals. In other words, I’d like to be able to deal more sensibly with the interaction of scientific and normative knowledge. This ties in with the incapacity of science and reason in their current forms to deal with ethics effectively, which I mentioned a moment ago.
I certainly don’t have all the answers here—I’m just pointing out the complex of interconnected reasons why I think contemporary science and rationality are limited in power and scope, and are going to be replaced by something richer and better as the growth of our individual and collective minds progresses. What will this new, better thing be? I’m not sure—but I have an inkling it will involve an integration of “third person” science/rationality with some sort of systematic approach to first-person and second-person experience.
Next, about “technical rationality”—of course that’s a whole other can of worms. Semi-formal rationality has a great track record; it’s brought us science and math and technology, for example. So even if it has some limitations, we certainly owe it some respect! Technical rationality has no such track record, and so my semi-formal scientific and rational nature impels me to be highly skeptical of it! I have no reason to believe, at present, that focusing on technical rationality (as opposed to the many other ways to focus our attention, given our limited time and processing power) will generally make people more intelligent or better at achieving their goals. Maybe it will, in some contexts—but what those contexts are, is something we don’t yet understand very well.
I provided consulting once to a project aimed at using computational neuroscience to understand the neurobiological causes of cognitive biases in people employed to analyze certain sorts of data. This is interesting to me; and it’s clear to me that in this context, minimization of some of these textbook cognitive biases would help these analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style.
On a mathematical basis, the justification for positing probability theory as the “correct” way to do reasoning under uncertainty relies on arguments like Cox’s axioms, or de Finetti’s Dutch Book arguments. These are beautiful pieces of math, but when you talk about applying them to the real world, you run into a lot of problems regarding the inapplicability of their assumptions. For instance, Cox’s axioms include an axiom specifying that (roughly speaking) multiple pathways of arriving at the same conclusion must lead to the same estimate of that conclusion’s truth value. This sounds sensible but in practice it’s only going to be achievable by minds with arbitrarily much computing capability at their disposal. In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources. They’re irrelevant to reality, except as inspirations to individuals of a certain cast of mind.
(An aside is that my own approach to AGI does heavily involve probability theory—using a system I invented called Probabilistic Logic Networks, which integrates probability and logic in a unique way. I like probabilistic reasoning. I just don’t venerate it as uniquely powerful and important. In my OpenCog AGI architecture, it’s integrated with a bunch of other AI methods, which all have their own strengths and weaknesses.)
So anyway—there’s no formal mathematical reason to think that “technical rationality” is a good approach in real-world situations; and “technical rationality” has no practical track record to speak of. And ordinary, semi-formal rationality itself seems to have some serious limitations of power and scope.
So what’s my conclusion? Semi-formal rationality is fantastic and important and we should use it and develop it—but also be open to the possibility of its obsolescence as we discover broader and more incisive ways of understanding the universe (and this is probably moderately close to what William Burroughs really thought). Technical rationality is interesting and well worth exploring but we should still be pretty skeptical of its value, at this stage—certainly, anyone who has supreme confidence that technical rationality is going to help humanity achieve its goals better, is being rather IRRATIONAL ;-) ….
In this vein, I’ve followed the emergence of the Less Wrong community with some amusement and interest. One ironic thing I’ve noticed about this community of people intensely concerned with improving their personal rationality is: by and large, these people are already hyper-developed in the area of rationality, but underdeveloped in other ways! Think about it—who is the prototypical Less Wrong meetup participant? It’s a person who’s very rational already, relative to nearly all other humans—but relatively lacking in other skills like intuitively and empathically understanding other people. But instead of focusing on improving their empathy and social intuition (things they really aren’t good at, relative to most humans), this person is focusing on fine-tuning their rationality more and more, via reprogramming their brains to more naturally use “technical rationality” tools! This seems a bit imbalanced. If you’re already a fairly rational person but lacking in other aspects of human development, the most rational thing may be NOT to focus on honing your “rationality fu” and better internalizing Bayes’ rule into your subconscious—but rather on developing those other aspects of your being.… An analogy would be: If you’re very physically strong but can’t read well, and want to self-improve, what should you focus your time on? Weight-lifting or literacy? Even if greater strength is ultimately your main goal, one argument for focusing on literacy would be that you might read something that would eventually help you weight-lift better! Also you might avoid getting ripped off by a corrupt agent offering to help you with your bodybuilding career, due to being able to read your own legal contracts. Similarly, for people who are more developed in terms of rational inference than other aspects, the best way for them to become more rational might be for them to focus time on these other aspects (rather than on fine-tuning their rationality), because this may give them a deeper and broader perspective on rationality and what it really means.
Finally, you asked: “What are your positive recommendations for reasoning about the world?” I’m tempted to quote Nietzsche’s Zarathustra, who said “Go away from me and resist Zarathustra!” I tend to follow my own path, and generally encourage others to do the same. But I guess I can say a few more definite things beyond that....
To me it’s all about balance. My friend Allan Combs calls himself a “philosophical Taoist” sometimes; I like that line! Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition, even if the logical reasons for their promptings aren’t apparent. Think carefully through the details of things; but don’t be afraid to make wild intuitive leaps. Pay close mind to the relevant data and observe the world closely and particularly; but don’t forget that empirical data is in a sense a product of the mind, and facts only have meaning in some theoretical context. Don’t let your thoughts be clouded by your emotions; but don’t be a feeling-less automaton, don’t make judgments that are narrowly rational but fundamentally unwise. As Ben Franklin said, “Moderation in all things, including moderation.”
Luke:
[Jan 14th, 2012]
I whole-heartedly agree that there are plenty of Less Wrongers who, rationally, should spend less time studying rationality and more time practicing social skills and generic self-improvement methods! This is part of why I’ve written so many scientific self-help posts for Less Wrong: Scientific Self Help, How to Beat Procrastination, How to Be Happy, Rational Romantic Relationships, and others. It’s also why I taught social skills classes at our two summer 2011 rationality camps.
Back to rationality. You talk about the “limitations” of “what one might call ‘semi-formal rationality’, as opposed to ‘technical rationality.‘” But I argued for technical rationality, so: what are the limitations of technical rationality? Does it, as you claim for “semi-formal rationality,” fail to apply to consciousness or ethics or precognition? Does Bayes’ Theorem remain true when looking at the evidence about awareness, but cease to be true when we look at the evidence concerning consciousness or precognition?
You talk about technical rationality’s lack of a track record, but I don’t know what you mean. Science was successful because it did a much better job of approximating perfect Bayesian probability theory than earlier methods did (e.g. faith, tradition), and science can be even more successful when it tries harder to approximate perfect Bayesian probability theory — see The Theory That Would Not Die.
You say that “minimization of some of these textbook cognitive biases would help [some] analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style.” But this misunderstands what I mean by Technical Rationality. If teaching these people about cognitive biases would lower the expected value of some project, then technical rationality would recommend against teaching these people cognitive biases (at least, for the purposes of maximizing the expected value of that project). Your example here is a case of Straw Man Rationality. (But of course I didn’t expect you to know everything I meant by Technical Rationality in advance! Though, I did provide a link to an explanation of what I meant by Technical Rationality in my first entry, above.)
The same goes for your dismissal of probability theory’s foundations. You write that “In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources.” Yes, we don’t have infinite computing power. The point is that Bayesian probability theory is an ideal that can be approximated by finite beings. That’s why science works better than faith — it’s a better approximation of using probability theory to reason about the world, even though science is still a long way from a perfect use of probability theory.
Re: goals. Your view of intelligence as “achieving complex goals in complex environments” does, as you say, assume that “the goal is strictly separated from the methods for achieving it.” I prefer a definition of intelligence as “efficient cross-domain optimization”, but my view — like yours — also assumes that goals (what one values) are logically orthogonal to intelligence (one’s ability to achieve what one values).
Nevertheless, you report an intuition that shaping every molecule into an image of Mickey Mouse is a “stupid” goal. But I don’t know what you mean by this. A goal of shaping every molecule into an image of Mickey Mouse is an instrumentally intelligent goal if one’s utility function will be maximized that way. Do you mean that it’s a stupid goal according to your goals? But of course. This is, moreover, what we would expect your intuitive judgments to report, even if your intuitive judgments are irrelevant to the math of what would and wouldn’t be an instrumentally intelligent goal for a different agent to have. The Mickey Mouse goal is “stupid” only by a definition of that term that is not the opposite of the explicit definitions either of us gave “intelligent,” and it’s important to keep that clear. And I certainly don’t know what “out of harmony with the Cosmos” is supposed to mean.
Re: induction. I won’t dive into that philosophical morass here. Suffice it to say that my views on the matter are expressed pretty well in Where Recursive Justification Hits Bottom, which is also a direct response to your view that science and reason are great but rely on “acts of faith.”
Your final paragraph sounds like common sense, but it’s too vague, as I think you would agree. One way to force a more precise answer to such questions is to think of how you’d program it into an AI. As Daniel Dennett said, “AI makes philosophy honest.”
How would you program an AI to learn about reality, if you wanted it to have the most accurate model of reality possible? You’d have to be a bit more specific than “Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition…”
My own answer to the question of how I would program an AI to build as accurate a model of reality as possible is this: I would build it to use computable approximations of perfect technical rationality — that is, roughly: computable approximations of Solomonoff induction and Bayesian decision theory.
Ben:
[Jan 21st, 2012]
Bayes Theorem is “always true” in a formal sense, just like 1+1=2, obviously. However, the connection between formal mathematics and subjective experience, is not something that can be fully formalized.
Regarding consciousness, there are many questions, including what counts as “evidence.” In science we typically count something as evidence if the vast majority of the scientific community counts it as a real observation—so ultimately the definition of “evidence” bottoms out in social agreement. But there’s a lot that’s unclear in this process of classifying an observation as evidence via a process of social agreement among multiple minds. This unclarity is mostly irrelevant to the study of trajectories of basketballs, but possibly quite relevant to study of consciousness.
Regarding psi, there are lots of questions, but one big problem is that it’s possible the presence and properties of a psi effect may depend on the broad context of the situation whether the effect takes place. Since we don’t know which aspects of the context are influencing the psi effect, we don’t know how to construct controlled experiments to measure psi. And we may not have the breadth of knowledge nor the processing power to reason about all the relevant context to a psi experiment, in a narrowly “technically rational” way.… I do suspect one can gather solid data demonstrating and exploring psi (and based on my current understanding, it seems this has already been done to a significant extent by the academic parapsychology community; see a few links I’ve gathered here), but I also suspect there many be aspects that elude the traditional scientific method, but are nonetheless perfectly real aspects of the universe.
Anyway both consciousness and psi are big, deep topics, and if we dig into them in detail, this interview will become longer than either of us has time for...
About the success of science—I don’t really accept your Bayesian story for why science was successful. It’s naive for reasons much discussed by philosophers of science. My own take on the history and philosophy of science, from a few years back, is here (that article was the basis for a chapter in The Hidden Pattern, also). My goal in that essay was “a philosophical perspective that does justice to both the relativism and sociological embeddedness of science, and the objectivity and rationality of science.” It seems you focus overly much on the latter and ignore the former. That article tries to explain why probabilist explanations of real-world science are quite partial and miss a lot of the real story. But again, a long debate on the history of science would take us too far off track from the main thrust of this interview.
About technical rationality, cognitive biases, etc. -- I did read that blog entry that you linked, on technical rationality. Yes, it’s obvious that focusing on teaching an employee to be more rational, need not always be the most rational thing for an employer do, even if that employer has a purely rationalist world-view. For instance, if I want to train an attack dog, I may do better by focusing limited time and attention on increasing his strength rather than his rationality. My point was that there’s a kind of obsession with rationality in some parts of the intellectual community (e.g. some of the Less Wrong orbit) that I find a bit excessive and not always productive. But your reply impels me to distinguish two ways this excess may manifest itself:
Excessive belief that rationality is the “right” way to solve problems and think about issues, in principle
Excessive belief that, tactically, explicitly employing tools of technical rationality is a good way to solve problems in the real world
Psychologically I think these two excesses probably tend to go together, but they’re not logically coupled. In principle, someone could hold either one, but not the other.
This sort of ties in with your comments on science and faith. You view science as progress over faith—and I agree if you interpret “faith” to mean “traditional religions.” But if you interpret “faith” more broadly, I don’t see a dichotomy there. Actually, I find the dichotomy between “science” and “faith” unfortunately phrased, since science itself ultimately relies on acts of faith also. The “problem of induction” can’t be solved, so every scientist must base his extrapolations from past into future based on some act of faith. It’s not a matter of science vs. faith, it’s a matter of what one chooses to place one’s faith in. I’d personally rather place faith in the idea that patterns observed in the past will likely continue into the future (as one example of a science-friendly article of faith), than in the word of some supposed “God”—but I realize I’m still making an act of faith.
This ties in with the blog post “Where Recursive Justification Hits Bottom” that you pointed out. It’s pleasant reading but of course doesn’t provide any kind of rational argument against my views. In brief, according to my interpretation, it articulates a faith in the process of endless questioning:
The important thing is to hold nothing back in your criticisms of how to criticize; nor should you regard the unavoidability of loopy justifications as a warrant of immunity from questioning.
I share that faith, personally.
Regarding approximations to probabilistic reasoning under realistic conditions (of insufficient resources), the problem is that we lack rigorous knowledge about what they are. We don’t have any theorems telling us what is the best way to reason about uncertain knowledge, in the case that our computational resources are extremely restricted. You seem to be assuming that the best way is to explicitly use the rules of probability theory, but my point is that there is no mathematical or scientific foundation for this belief. You are making an act of faith in the doctrine of probability theory! You are assuming, because it feels intuitively and emotionally right to you, that even if the conditions of the arguments for the correctness of probabilistic reasoning are NOT met, then it still makes sense to use probability theory to reason about the world. But so far as I can tell, you don’t have a RATIONAL reason for this assumption, and certainly not a mathematical reason.
Re your response to my questioning the reduction of intelligence to goals and optimization—I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response to my doubts about this perspective basically just re-asserts your faith in the correctness and completeness of this sort of perspective. Your statement
The Mickey Mouse goal is “stupid” only by a definition of that term that is not the opposite of the explicit definitions either of us gave “intelligent,” and it’s important to keep that clear
basically asserts that it’s important to agree with your opinion on the ultimate meaning of intelligence!
On the contrary, I think it’s important to explore alternatives to the understanding of intelligence in terms of optimization or goal-achievement. That is something I’ve been thinking about a lot lately. However, I don’t have a really crisply-formulated alternative yet.
As a mathematician, I tend not to think there’s a “right” definition for anything. Rather, one explains one’s definitions, and then works with them and figures out their consequences. In my AI work, I’ve provisionally adopted a goal-achievemement based understanding of intelligence—and have found this useful, to a significant extent. But I don’t think this is the true and ultimate way to understand intelligence. I think the view of intelligence in terms of goal-achievement or cross-domain optimization misses something, which future understandings of intelligence will encompass. I’ll venture that in 100 years the smartest beings on Earth will have a rigorous, detailed understanding of intelligence according to which
The Mickey Mouse goal is “stupid” only by a definition of that term that is not the opposite of the explicit definitions either of us gave “intelligent,” and it’s important to keep that clear
seems like rubbish.....
As for your professed inability to comprehend the notion of “harmony with the Cosmos”—that’s unfortunate for you, but I guess trying to give you a sense for that notion, would take us way too far afield in this dialogue!
Finally, regarding your complaint that my indications regarding how to understanding the world are overly vague. Well—according to Franklin’s idea of “Moderation in all things, including moderation”, one should also exercise moderation in precisiation. Not everything needs to be made completely precise and unambiguous (fortunately, since that’s not feasible anyway).
I don’t know how I would program an AI to build as accurate a model of reality as possible, if that were my goal. I’m not sure that’s the best goal for AI development, either. An accurate model in itself, doesn’t do anything helpful. My best stab in the direction of how I would ideally create an AI, if computational resource restrictions were no issue, is the GOLEM design that I described here. GOLEM is a design for a strongly self-modifying superintelligent AI system, which might plausibly have the possibility of retaining its initial goal system through successive self-modifications. However, it’s unclear to me whether it will ever be feasible to build.
You mention Solomonoff induction and Bayesian decision theory. But these are abstract mathematical constructs, and it’s unclear to me whether it will ever be feasible to build an AI system fundamentally founded on these ideas, and operating within feasible computational resources. Marcus Hutter and Juergen Schmidhuber and their students are making some efforts in this direction, and I admire those researchers and this body of work, but don’t currently have a high estimate of its odds of leading to any sort of powerful real-world AGI system.
Most of my thinking about AGI has gone into the more practical problem of how to make a human-level AGI
using currently feasible computational resources
that will most likely be helpful rather than harmful in terms of the things I value
that will be smoothly extensible to intelligence beyond the human level as well.
For this purpose, I think Solomonoff induction and probability theory are useful, but aren’t all-powerful guiding principles. For instance, in the OpenCog AGI design (which is my main practical AGI-oriented venture at present), there is a component doing automated program learning of small programs—and inside our program learning algorithm, we explicitly use an Occam bias, motivated by the theory of Solomonoff induction. And OpenCog also has a probabilistic reasoning engine, based on the math of Probabilistic Logic Networks (PLN). I don’t tend to favor the language of “Bayesianism”, but I would suppose PLN should be considered “Bayesian” since it uses probability theory (including Bayes rule) and doesn’t make a lot of arbitrary, a priori distributional assumptions. The truth value formulas inside PLN are based on an extension of imprecise probability theory, which in itself is an extension of standard Bayesian methods (looking at envelopes of prior distributions, rather than assuming specific priors).
In terms of how to get an OpenCog system to model the world effectively and choose its actions appropriately, I think teaching it and working together with it, will be be just as important as programming it. Right now the project is early-stage and the OpenCog design is maybe 50% implemented. But assuming the design is right, once the implementation is done, we’ll have a sort of idiot savant childlike mind, that will need to be educated in the ways of the world and humanity, and to learn about itself as well. So the general lessons of how to confront the world, that I cited above, would largely be imparted via interactive experiential learning, vaguely the same way that human kids learn to confront the world from their parents and teachers.
Drawing a few threads from this conversation together, it seems that
I think technical rationality, and informal semi-rationality, are both useful tools for confronting life—but not all-powerful
I think Solomonoff induction and probability theory are both useful tools for constructing AGI systems—but not all-powerful
whereas you seem to ascribe a more fundamental, foundational basis to these particular tools.
Luke:
[Jan. 21st, 2012]
To sum up, from my point of view:
We seem to disagree on the applications of probability theory. For my part, I’ll just point people to A Technical Explanation of Technical Explanation.
I don’t think we disagree much on the “sociological embeddedness” of science.
I’m also not sure how much we really disagree about Solomonoff induction and Bayesian probability theory. I’ve already agreed that no machine will use these in practice because they are not computable — my point was about their provable optimality given infinite computation (subject to qualifications; see AIXI).
You’ve definitely misunderstood me concerning “intelligence.” This part is definitely not true: “I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response assumes the correctness and completeness of this sort of perspective.” Intelligence as efficient cross-domain optimization is merely a stipulated definition. I’m happy to use other definitions of intelligence in conversation, so long as we’re clear which definition we’re using when we use the word. Or, we can replace the symbol with the substance and talk about “efficient cross-domain optimization” or “achieving complex goals in complex environments” without ever using the word “intelligence.”
My point about the Mickey Mouse goal was that when you called the Mickey Mouse goal “stupid,” this could be confusing, because “stupid” is usually the opposite of “intelligent,” but your use of “stupid” in that sentence didn’t seem to be the opposite of either definition of intelligence we each gave. So I’m still unsure what you mean by calling the Mickey Mouse goal “stupid.”
This topic provides us with a handy transition away from philosophy of science and toward AGI. Suppose there was a machine with a vastly greater-than-human capacity for either “achieving complex goals in complex environments” or for “efficient cross-domain optimization.” And suppose that machine’s utility function would be maximized by reshaping every molecule into a Mickey Mouse shape. We can avoid the tricky word “stupid,” here. The question is: Would that machine decide to change its utility function so that it doesn’t continue to reshape every molecule into a Mickey Mouse shape? I think this is unlikely, for reasons discussed in Omohundro (2008).
I suppose a natural topic of conversation for us would be your October 2010 blog post The Singularity Institute’s’s Scary Idea (and Why I Don’t Buy It). Does that post still reflect your views pretty well, Ben?
Ben:
[Mar 10th, 2012]
About the hypothetical uber-intelligence that wants to tile the cosmos with molecular Mickey Mouses—I truly don’t feel confident making any assertions about a real-world system with vastly greater intelligence than me. There are just too many unknowns. Sure, according to certain models of the universe and intelligence that may seem sensible to some humans, it’s possible to argue that a hypothetical uber-intelligence like that would relentlessly proceed in tiling the cosmos with molecular Mickey Mouses. But so what? We don’t even know that such an uber-intelligence is even a possible thing—in fact my intuition is that it’s not possible.
Why may it not be possible to create a very smart AI system that is strictly obsessed with that stupid goal? Consider first that it may not be possible to create a real-world, highly intelligent system that is strictly driven by explicit goals—as opposed to being partially driven by implicit, “unconscious” (in the sense of deliberative, reflective consciousness) processes that operate in complex interaction with the world outside the system. Because pursuing explicit goals is quite computationally costly compared to many other sorts of intelligent processes. So if a real-world system is necessarily not wholly explicit-goal-driven, it may be that intelligent real-world systems will naturally drift away from certain goals and toward others. My strong intuition is that the goal of tiling the universe with molecular Mickey Mouses would fall into that category. However, I don’t yet have any rigorous argument to back this up. Unfortunately my time is limited, and while I generally have more fun theorizing and philosophizing than working on practical projects, I think it’s more important for me to push toward building AGI than just spend all my time on fun theory. (And then there’s the fact that I have to spend a lot of my time on applied narrow-AI projects to pay the mortgage and put my kids through college, etc.)
But anyway—you don’t have any rigorous argument to back up the idea that a system like you posit is possible in the real-world, either! And SIAI has staff who, unlike me, are paid full-time to write and philosophize … and they haven’t come up with a rigorous argument in favor of the possibility of such a system, either. Although they have talked about it a lot, though usually in the context of paperclips rather than Mickey Mouses.
So, I’m not really sure how much value there is in this sort of thought-experiment about pathological AI systems that combine massively intelligent practical problem solving capability with incredibly stupid goals (goals that may not even be feasible for real-world superintelligences to adopt, due to their stupidity).
Regarding the concept of a “stupid goal” that I keep using, and that you question—I admit I’m not quite sure how to formulate rigorously the idea that tiling the universe with Mickey Mouses is a stupid goal. This is something I’ve been thinking about a lot recently. But here’s a first rough stab in that direction: I think that if you created a highly intelligent system, allowed it to interact fairly flexibly with the universe, and also allowed it to modify its top-level goals in accordance with its experience, you’d be very unlikely to wind up with a system that had this goal (tiling the universe with Mickey Mouses). That goal is out of sync with the Cosmos, in the sense that an intelligent system that’s allowed to evolve itself in close coordination with the rest of the universe, is very unlikely to arrive at that goal system. I don’t claim this is a precise definition, but it should give you some indication of the direction I’m thinking in....
The tricky thing about this way of thinking about intelligence, which classifies some goals as “innately” stupider than others, is that it places intelligence not just in the system, but in the system’s broad relationship to the universe—which is something that science, so far, has had a tougher time dealing with. It’s unclear to me which aspects of the mind and universe science, as we now conceive it, will be able to figure out. I look forward to understanding these aspects more fully....
About my blog post on “The Singularity Institute’s Scary Idea”—yes, that still reflects my basic opinion. After I wrote that blog post, Michael Anissimov—a long-time SIAI staffer and zealot whom I like and respect greatly—told me he was going to write up and show me a systematic, rigorous argument as to why “an AGI not built based on a rigorous theory of Friendliness is almost certain to kill all humans” (the proposition I called “SIAI’s Scary Idea”). But he hasn’t followed through on that yet—and neither has Eliezer or anyone associated with SIAI.
Just to be clear, I don’t really mind that SIAI folks hold that “Scary Idea” as an intuition. But I find it rather ironic when people make a great noise about their dedication to rationality, but then also make huge grand important statements about the future of humanity, with great confidence and oomph, that are not really backed up by any rational argumentation. This ironic behavior on the part of Eliezer, Michael Anissimov and other SIAI principals doesn’t really bother me, as I like and respect them and they are friendly to me, and we’ve simply “agreed to disagree” on these matters for the time being. But the reason I wrote that blog post is because my own blog posts about AGI were being trolled by SIAI zealots (not the principals, I hasten to note) leaving nasty comments to the effect of “SIAI has proved that if OpenCog achieves human level AGI, it will kill all humans.“ Not only has SIAI not proved any such thing, they have not even made a clear rational argument!
As Eliezer has pointed out to me several times in conversation, a clear rational argument doesn’t have to be mathematical. A clearly formulated argument in the manner of analytical philosophy, in favor of the Scary Idea, would certainly be very interesting. For example, philosopher David Chalmers recently wrote a carefully-argued philosophy paper arguing for the plausibility of a Singularity in the next couple hundred years. It’s somewhat dull reading, but it’s precise and rigorous in the manner of analytical philosophy, in a manner that Kurzweil’s writing (which is excellent in its own way) is not. An argument in favor of the Scary Idea, on the level of Chalmers’ paper on the Singularity, would be an excellent product for SIAI to produce. Of course a mathematical argument might be even better, but that may not be feasible to work on right now, given the state of mathematics today. And of course, mathematics can’t do everything—there’s still the matter of connecting mathematics to everyday human experience, which analytical philosophy tries to handle, and mathematics by nature cannot.
My own suspicion, of course, is that in the process of trying to make a truly rigorous analytical philosophy style formulation of the argument for the Scary Idea, the SIAI folks will find huge holes in the argument. Or, maybe they already intuitively know the holes are there, which is why they have avoided presenting a rigorous write-up of the argument!!
Luke:
[Mar 11th, 2012]
I’ll drop the stuff about Mickey Mouse so we can move on to AGI. Readers can come to their own conclusions on that.
Your main complaint seems to be that the Singularity Institute hasn’t written up a clear, formal argument (in analytic philosophy’s sense, if not the mathematical sense) in defense of our major positions — something like Chalmers’ “The Singularity: A Philosophical Analysis” but more detailed.
I have the same complaint. I wish “The Singularity: A Philosophical Analysis” had been written 10 years ago, by Nick Bostrom and Eliezer Yudkowsky. It could have been written back then. Alas, we had to wait for Chalmers to speak at Singularity Summit 2009 and then write a paper based on his talk. And if it wasn’t for Chalmers, I fear we’d still be waiting for such an article to exist. (Bostrom’s forthcoming Superintelligence book should be good, though.)
I was hired by the Singularity Institute in September 2011 and have since then co-written two papers explaining some of the basics: “Intelligence Explosion: Evidence and Import” and “The Singularity and Machine Ethics”. I also wrote the first ever outline of categories of open research problems in AI risk, cheekily titled “So You Want to Save the World”. I’m developing other articles on “the basics” as quickly as I can. I would love to write more, but alas, I’m also busy being the Singularity Institute’s Executive Director.
Perhaps we could reframe our discussion around the Singularity Institute’s latest exposition of its basic ideas, “Intelligence Explosion: Evidence and Import”? Which claims in that paper do you most confidently disagree with, and why?
Ben:
[Mar 11th, 2012]
You say “Your main complaint seems to be that the Singularity Institute hasn’t written up a clear, formal argument (in analytic philosophy’s sense, if not the mathematical sense) in defense of our major positions “. Actually, my main complaint is that some of SIAI’s core positions seem almost certainly WRONG, and yet they haven’t written up a clear formal argument trying to justify these positions—so it’s not possible to engage SIAI in rational discussion on their apparently wrong positions. Rather, when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me—SIAI is a fairly well-funded organization involving lots of smart people and explicitly devoted to rationality, so certainly it should have the capability to write up clear arguments for its core positions… if these arguments exist. My suspicion is that the Scary Idea, for example, is not backed up by any clear rational argument—so the reason SIAI has not put forth any clear rational argument for it, is that they don’t really have one! Whereas Chalmers’ paper carefully formulated something that seemed obviously true...
Regarding the paper “Intelligence Explosion: Evidence and Import”, I find its contents mainly agreeable—and also somewhat unoriginal and unexciting, given the general context of 2012 Singularitarianism. The paper’s three core claims that
(1) there is a substantial chance we will create human-level AI before 2100, that (2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an “intelligence explosion,” and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.
are things that most “Singularitarians” would agree with. The paper doesn’t attempt to argue for the “Scary Idea” or Coherent Extrapolated Volition or the viability of creating some sort of provably Friendly AI, -- or any of the other positions that are specifically characteristic of SIAI. Rather, the paper advocates what one might call “plain vanilla Singularitarianism.” This may be a useful thing to do, though, since after all there are a lot of smart people out there who aren’t convinced of plain vanilla Singularitarianism.
I have a couple small quibbles with the paper, though. I don’t agree with Omohundro’s argument about the “basic AI drives” (though Steve is a friend and I greatly respect his intelligence and deep thinking). Steve’s argument for the inevitability of these drives in AIs is based on evolutionary ideas, and would seem to hold up in the case that there is a population of distinct AIs competing for resources—but the argument seems to fall apart in the case of other possibilities like an AGI mindplex (a network of minds with less individuality than current human minds, yet not necessarily wholly blurred into a single mind—rather, with reflective awareness and self-modeling at both the individual and group level).
Also, my “AI Nanny” concept is dismissed too quickly for my taste (though that doesn’t surprise me!). You suggest in this paper that to make an AI Nanny, it would likely be necessary to solve the problem of making an AI’s goal system persist under radical self-modification. But you don’t explain the reasoning underlying this suggestion (if indeed you have any). It seems to me—as I say in my “AI Nanny” paper—that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification. If you think this is false, it would be nice for you to explain why, rather than simply asserting your view. And your comment “Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety...” carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory. Yet my GOLEM design is a concrete design for a potentially Friendly AI (admittedly not computationally feasible using current resources), and in my view constitutes greater progress toward actual FAI than any of the publications of SIAI so far. (Of course, various SIAI associated folks often allude that there are great, unpublished discoveries about FAI hidden in the SIAI vaults—a claim I somewhat doubt, but can’t wholly dismiss of course....)
Anyway, those quibbles aside, my main complaint about the paper you cite is that it sticks to “plain vanilla Singularitarianism” and avoids all of the radical, controversial positions that distinguish SIAI from myself, Ray Kurzweil, Vernor Vinge and the rest of the Singularitarian world. The crux of the matter, I suppose is the third main claim of the paper,
(3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.
This statement is hedged in such a way as to be almost obvious. But yet, what SIAI folks tend to tell me verbally and via email and blog comments is generally far more extreme than this bland and nearly obvious statement.
As an example, I recall when your co-author on that article, Anna Salamon, guest lectured in the class on Singularity Studies that my father and I were teaching at Rutgers University in 2010. Anna made the statement, to the students, that (I’m paraphrasing, though if you’re curious you can look up the online course session which was saved online and find her exact wording) “If a superhuman AGI is created without being carefully based on an explicit Friendliness theory, it is ALMOST SURE to destroy humanity.” (i.e., what I now call SIAI’s Scary Idea)
I then asked her (in the online class session) why she felt that way, and if she could give any argument to back up the idea.
She gave the familiar SIAI argument that, if one picks a mind at random from “mind space”, the odds that it will be Friendly to humans are effectively zero.
I made the familiar counter-argument that this is irrelevant, because nobody is advocating building a random mind. Rather, what some of us are suggesting is to build a mind with a Friendly-looking goal system, and a cognitive architecture that’s roughly human-like in nature but with a non-human-like propensity to choose its actions rationally based on its goals, and then raise this AGI mind in a caring way and integrate it into society. Arguments against the Friendliness of random minds are irrelevant as critiques of this sort of suggestion.
So, then she fell back instead on the familiar (paraphrasing again) “OK, but you must admit there’s a non-zero risk of such an AGI destroying humanity, so we should be very careful—when the stakes are so high, better safe than sorry!”
I had pretty much the same exact argument with SIAI advocates Tom McCabe and Michael Anissimov on different occasions; and also, years before, with Eliezer Yudkowsky and Michael Vassar—and before that, with (former SIAI Executive Director) Tyler Emerson. Over all these years, the SIAI community maintains the Scary Idea in its collective mind, and also maintains a great devotion to the idea of rationality, but yet fails to produce anything resembling a rational argument for the Scary Idea—instead repetitiously trotting out irrelevant statements about random minds!!
What I would like is for SIAI to do one of these three things, publicly:
Repudiate the Scary Idea
Present a rigorous argument that the Scary Idea is true
State that the Scary Idea is a commonly held intuition among the SIAI community, but admit that no rigorous rational argument exists for it at this point
Doing any one of these things would be intellectually honest. Presenting the Scary Idea as a confident conclusion, and then backing off when challenged into a platitudinous position equivalent to “there’s a non-zero risk … better safe than sorry...”, is not my idea of an intellectually honest way to do things.
Why does this particular point get on my nerves? Because I don’t like SIAI advocates telling people that I, personally, am on a R&D course where if I succeed I am almost certain to destroy humanity!!! That frustrates me. I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time! But the fact that some other people have a non-rational intuition that my work, if successful, would be likely to destroy the world—this doesn’t give me any urge to stop. I’m OK with the fact that some other people have this intuition—but then I’d like them to make clear, when they state their views, that these views are based on intuition rather than rational argument. I will listen carefully to rational arguments that contravene my intuition—but if it comes down to my intuition versus somebody else’s, in the end I’m likely to listen to my own, because I’m a fairly stubborn maverick kind of guy....
Luke:
[Mar 11th, 2012]
Ben, you write:
when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me...
No kidding! It’s very frustrating to me, too. That’s one reason I’m working to clearly articulate the arguments in one place, starting with articles on the basics like “Intelligence Explosion: Evidence and Import.”
I agree that “Intelligence Explosion: Evidence and Import” covers only the basics and does not argue for several positions associated uniquely with the Singularity Institute. It is, after all, the opening chapter of a book intelligence explosion, not the opening chapter of a book on the Singularity Institute’s ideas!
I wanted to write that article first, though, so the Singularity Institute could be clear on the basics. For example, we needed to be clear that: (1) we are not Kurzweil, and our claims don’t depend on his detailed storytelling or accelerating change curves, that (2) technological prediction is hard, and we are not being naively overconfident about AI timelines, and that (3) intelligence explosion is a convergent outcome of many paths the future may take. There is also much content that is not found in, for example, Chalmers’ paper: (a) an overview of methods of technological prediction, (b) an overview of speed bumps and accelerators toward AI, (c) a reminder of breakthroughs like AIXI, and (d) a summary of AI advantages. (The rest is, as you say, mostly a brief overview of points that have been made elsewhere. But brief overviews are extremely useful!)
...my “AI Nanny” concept is dismissed too quickly for my taste...
No doubt! I think the idea is clearly worth exploring in several papers devoted to the topic.
It seems to me—as I say in my “AI Nanny” paper—that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification.
Whereas I tend to buy Omohundro’s arguments that advanced AIs will want to self-improve just like humans want to self-improve, so that they become better able to achieve their final goals. Of course, we disagree on Omohundro’s arguments — a topic to which I will return in a moment.
your comment “Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety...” carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory...
I didn’t mean for it to carry that connotation. GOLEM and Nanny AI are both clearly AI safety ideas. I’ll clarify that part before I submit a final draft to the editors.
Moving on: If you are indeed remembering your conversations with Anna, Michael, and others correctly, then again I sympathize with your frustration. I completely agree that it would be useful for the Singularity Institute to produce clear, formal arguments for the important positions it defends. In fact, just yesterday I was talking to Nick Beckstead about how badly both of us want to write these kinds of papers if we can find the time.
So, to respond to your wish that the Singularity Institute choose among three options, my plan is to (1) write up clear arguments for… well, if not “SIAI’s Big Scary Idea” then for whatever I end up believing after going through the process of formalizing the arguments, and (2) publicly state (right now) that SIAI’s Big Scary Idea is a commonly held view at the Singularity Institute but a clear, formal argument for it has never been published (at least, not to my satisfaction).
I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time!
I’m glad to hear it! :)
Now, it seems a good point of traction is our disagreement over Omohundro’s “Basic AI Drives.” We could talk about that next, but for now I’d like to give you a moment to reply.
Ben:
[Mar 11th, 2012]
Yeah, I agree that your and Anna’s article is a good step for SIAI to take, albeit unexciting to a Singularitian insider type like me.… And I appreciate your genuinely rational response regarding the Scary Idea, thanks!
(And I note that I have also written some “unexciting to Singularitarians” material lately too, for similar reasons to those underlying your article—e.g. an article on “Why an Intelligence Explosion is Probable” for a Springer volume on the Singularity.)
A quick comment on your statement that
we are not Kurzweil, and our claims don’t depend on his detailed storytelling or accelerating change curves,
that’s a good point; but yet, any argument for a Singularity soon (e.g. likely this century, as you argue) ultimately depends on some argumentation analogous to Kurzweil’s, even if different in detail. I find Kurzweil’s detailed extrapolations a bit overconfident and more precise than the evidence warrants; but still, my basic reasons for thinking the Singularity is probably near are fairly similar to his—and I think your reasons are fairly similar to his as well.
Anyway, sure, let’s go on to Omohundro’s posited Basic AI Drives—which seem to me not to hold as necessary properties of future AIs unless the future of AI consists of a population of fairly distinct AIs competing for resources, which I intuitively doubt will be the situation.
[to be continued]
- Muehlhauser-Wang Dialogue by 22 Apr 2012 22:40 UTC; 34 points) (
- Muehlhauser-Goertzel Dialogue, Part 2 by 5 May 2012 0:21 UTC; 16 points) (
- [draft] Concepts are Difficult, and Unfriendliness is the Default: A Scary Idea Summary by 31 Mar 2012 10:07 UTC; 12 points) (
- 27 Jul 2012 18:14 UTC; 1 point) 's comment on Work on Security Instead of Friendliness? by (
This exchange significantly decreased my probability that Ben Goertzel is a careful thinker about AI problems. I think he has a good point about “rationalists” being too much invested in “rationality” (as opposed to rationality), but his AI thoughts are just seriously wtf. In tune with the Cosmos? Does this mean anything at all? I hate to say it based on a short conversation, but it looks like Ben Goertzel hasn’t made any of his intuitions precise enough to even be wrong. And he makes the classic mistake of thinking “any intelligence” would avoid certain goal-types (i.e. ‘fill the future light cone with some type of substance’) because they’re… stupid? I don’t even...
Quoth Yvain:
He published a book called A Cosmist Manifesto which presumably describes some of his thoughts in more detail. It looked too new-age for me to take much interest.
Upvoted.
Goertzel’s belief in AI FOOMs coupled with his beliefs in psi phenomena and the inherent stupidity of paperclipping made me lower my confidence in the likelihood of AI FOOMs slightly. Was this a reasonable operation, do you think?
It depends.
If you were previously aware of Goertzel’s belief in AI FOOM but not his opinions on psi/paperclipping then you should lower your confidence slightly. (Exactly how much depends on what other evidence/opinions you have to hand).
If the SIAI were wheeling out Goertzel as an example of “look, here’s someone who believes in FOOM” then it should lower your confidence
If you were previously unaware of Goertzel’s belief in FOOM then it should probably increase your confidence very slightly. Reversed stupidity is not intelligence
Obviously the quanitity of “slightly” depends on what other evidence/opinions you have to hand.
This is a good analysis. I was previously weakly aware of Goertzel’s beliefs on psi/paperclipping, and didn’t know much about his opinions on AI other than that he was working on superhuman AGI but didn’t have as much concern for Friendliness as SIAI. So I suppose my confidence shouldn’t change very much either way. I’m still on the fence on several questions related to Singularitarianism, so I’m trying to get evidence wherever I can find it.
I feel morally obligated to restate a potentially relevant observation:
I think that an important underlying difference of perspective here is that the Less Wrong memes tend to automatically think of all AGIs as essentially computer programs whereas Goertzel-like memes tend to automatically think of at least some AGIs as non-negligibly essentially person-like. I think this is at least partially because the Less Wrong memes want to write an FAI that is essentially some machine learning algorithms plus a universal prior on top of sound decision theory whereas the Goertzel-like memes want to write an FAI that is essentially roughly half progam-like and half person-like. Less Wrong memes think that person AIs won’t be sufficiently person-like but they sort of tend to assume that conclusion rather than argue for it, which causes memes that aren’t familiar with Less Wrong memes to wonder why Less Wrong memes are so incredibly confident that all AIs will necessarily act like autistic OCD people without any possibility at all of acting like normal reasonable people. From that perspective the Goertzel-like memes look justified in being rather skeptical of Less Wrong memes. After all, it is easy to imagine a gradation between AIXI and whole brain emulations. Goertzel-like memes wish to create an AI somewhere between those two points, Less Wrong memes wish to create an AI that’s even more AIXI-like than AIXI is (in the sense of being more formally and theoretically well-founded than AIXI is). It’s important that each look at the specific kinds of AI that the other has in mind and start the exchange from there.
We don’t know if AIXI-approximating AIs would even be intelligent; how then can we be so confident that AIXI is a normative model and a definition of intelligence? This and other intuitions are likely underlying Goertzel’s cautious epistemic state, and LessWrong/SingInst truly hasn’t addressed issues like this. We don’t know what it takes to build AGI, we don’t know if intelligence runs on Bayes structure. Modern decision theory indicates that Eliezer was wrong, that Bayes structure isn’t fundamental to agentic optimization, that it only applies in certain cases, that Bayesian information theoretic models of cognition might not capture the special sauce of intelligence. What is fundamental? We don’t know! In the meantime we should be careful about drawing conclusions based on the assumed fundamental-ness of mathematical models which may or may not ultimately be accurate models, may or may not actually let you build literalistic self-improving AIs of the sort that LessWrong likes to speculate about.
I think your first paragraph was very useful.
I have no idea what your second paragraph is about—“modern decision theory” is not a very specific citation. If there is research concluding that probability theory only applies to certain special cases of optimization, it would be awesome if you could make a top-level post explaining it to us!
There have already been many top-level posts, but you’re right that I should have linked to them. Here is the LessWrong Wiki hub, here is a post by Wei Dai that cuts straight to the point.
A whole lot of the sequences are dedicated to outlining just how reasonably normal people don’t act. I would want any Strong AI in charge of our fates to be person-like in that it is aware of what humans want in a way that we would accept, because the alternative to that is probably disaster, but I wouldn’t want one to be person-like in that its inductive biases are more like a human’s than an ideal Bayesian reasoner’s, or that it reasons about moral issues the way humans do intuitively, because our biases are often massively inappropriate, and our moral intuitions incoherent.
Check out this post by Vladimir Nesov: “The problem of choosing Bayesian priors is in general the problem of formalizing preference, it can’t be solved completely without considering utility, without formalizing values, and values are very complicated. No simple morality, no simple probability.” Of course, having a human prior doesn’t necessitate being human-like… Or does it? Duh duh duh.
Today I’d rather say that we don’t know if “priors” is a fundamentally meaningful decision-theoretic idea, and so discussing what does or doesn’t determine it would be premature.
Wow, I only associate that level of arrogance with Eliezer.
I don’t see how it’s arrogance, except maybe by insinuation/connotation; I’ll think about how to remove the insinuation/connotation. I was trying to describe an important skill of rationality, not assert my supremacy at that skill. But describing a skill sort of presupposes that the audience lacks the skill. So it’s awkward.
It’s arrogance because you’re implying that you’ve already thought of and rejected any objection the reader could come up with.
Didn’t mean to imply that; deleted the offending paragraph at any rate.
Your comments are probably better without such meta appendices. I lambast LW for being wrong about many worlds and for having a crypto-dualist philosophy of mind, and I find directness is better than intricate attempts to preempt the reader’s default epistemology. Going meta is not always for the best; save it up and then use it in the second round if you have to.
This applies doubly for those whose ‘meta’ position is so closely associated with either fundamental quantum monads or outright support of theism based on the Catholic god.
(Inconsequential stylistic complaint: Atheists like to do it all the time, but it strikes me as juvenile not to capitalize “Catholic” or “God”. If you don’t capitalize “catholic” then it just means “universal”, and not capitalizing “God” is like making a point of writing Eliezer’s name as “eliezer” just because you think he’s the Antichrist. It’s contemptibly petty. Is there some justification I’m missing? (I’m not judging you by the way, just imagining a third party judge.))
That’s true. Not writing “Catholic” was an error. It’s not like the Catholic religion is any more universal than, say, the ‘Liberal’ party here is particularly liberal. Names get capitals so we don’t confuse them with real words.
But here you are wrong.
When referring to supernatural entities that fall into the class ‘divine’ the label that applies is ‘god’. For example, Zeus is a god, Allah is a god and God is a god. If you happened to base your theology around Belar I would have written “the Alorn god”. Writing “the Alorn God” would be a corruption of grammar. If I was making a direct reference to God I would capitalize His name. I wasn’t. I was referring to a religion which, being monotheistic can be dereferenced to specify a particular fictional entity.
Other phrases I may utter:
The Arendish god is Chaldan
The Protestant god is God.
Children believe in believing in the Easter Bunny.
The historic conceit that makes using capitalization appropriate when referring to God does not extend to all usages of the word ‘god’, even when the ultimate referent is Him. For all the airs we may give Him, God is just a god—with all that entails.
Sorry, you’re right, what confused me was “catholic god” in conjunction; “Catholic god” wouldn’t have tripped me up.
I think you’re right, I’ll just remove it.
By the way I’ve come to think that your intuitions re quantum mind/monadology are at least plausibly correct/in-the-right-direction, but this epistemic shift hasn’t changed my thoughts about FAI at all; thus I fear compartmentalization on my part, and I’d like to talk with you about it when I’m able to reliably respond to email. It seems to me that there’s insufficient disturbed-ness about disagreement amongst the serious-minded Friendliness community.
Also, what’s your impression re psi? Or maybe it’s best not to get into that here.
Sounds like a good thing to have in a “before hitting ‘reply,’ consider these” checklist; but not to add to your own comment (for, as Will might say, “game-theoretic and signaling reasons.”)
This exposes a circularity in lesswrongian reasoning: if you think of an AI as fundamental non-person like, then there is a need to bolt on human values. If you think of it as human—like , then huma-like values are more likely to be inhrerent or acquired naturally through interaction.
I don’t see the circularity. “human” is a subset of “person”; there’s no reason an AI that is a “person” will have “human” values. Also, just thinking of the AI as being human-like doesn’t actually make it human-like.
I dont’ see the relevance. Goetzel isn’t talking about building non-human persons.
If you design an AI on x-like principles, it will probably be X-like, unless something goes wrong.
Ah, I may not have gotten all the context.
If “something goes wrong” with high probability, it will probably not be X-like.
More the reverse. I don’t support your representation of either what LW memes or Eliezer’s. I’d call this a straw man.
It’s strange that people say the arguments for Big Scary Idea are not written anywhere. The argument seems to be simple and direct:
Hard takeoff will make AI god-powerful very quickly.
During hard takeoff, the AI’s utility=goals=values=what-it-optimizes-for will solidify (when AI understand its own theory and self-modify correspondingly), and even if it was changeable before, it will be unchangeable forever since.
Unless the AI goals embody every single value important for humans and are otherwise just right in every respect, the results of using god powers to optimize for these goals will be horrible.
Human values are not a natural category, there’s little to no chance that AI will converge on them by itself, unless specifically and precisely programmed.
The only really speculative step is step 1. But if you already believe in singularity and hard foom, then the argument should be unrefutable...
Arguments for step 2, e.g. the Omohundroan Ghandi folk theorem, are questionable. Step 3 isn’t supported with impressive technical arguments anywhere I know of, step 4 isn’t supported with impressive technical arguments anywhere I know of. Remember, there are a lot of moral realists out there who think of AIs as people who will sense and feel compelled by moral law. It’s hard to make impressive technical arguments against that intuition. FOOM=doom and FOOM=yay folk can both point out a lot of facts about the world and draw analogies, but as far as impressive technical arguments go there’s not much that can be done, largely because we have never built an AGI. It’s a matter of moral philosophy, an inherently tricky subject.
I don’t understand how Omohundroan Ghandi folk theorem is related to step 2. Could you elaborate? Step 2 looks obvious to me: assuming step 1, at some point the AI with imprecise and drifting utility would understand how to build a better AI with precise and fixed utility. Since building this better AI will maximize the current AI utility, the better AI will be built and its utility forever solidified.
As you say, steps 3 and 4 are currently hard to support with technical arguments, there are so many non-technical concepts involved. And it may be hard to argue intuitively with most people. But Goertzel is a programmer, he should know how programs behave :) Of course, he says his program will be intelligent, not stupid, and it is a good idea, as long as it is remembered that intelligent in this sense already means friendly, and friendliness does not follow from just being a powerful optimization process.
Also, thinking of AIs as people can only work up to the point where AI achieves complete self-understanding. This has never happened to humans.
Hm, when I try to emulate Goertzel’s perspective I think about it this way: if you look at brains, they seem to be a bunch of machine learning algorithms and domain-specific modules largely engineered to solve tricky game theory problems. Love isn’t something that humans do despite game theory; love is game theory. And yet despite that it seems that brains end up doing lots of weird things like deciding to become a hermit or paint or compose or whatever. That’s sort of weird; if you’d asked me what chimps would evolve into when they became generally intelligent, and I hadn’t already seen humans or humanity, then I might’ve guessed that they’d evolve to develop efficient mating strategies, e.g. arranged marriage, and efficient forms of dominance contests, e.g. boxing with gloves, that don’t look at all like the memetic heights of academia or the art scene. Much of academia is just social maneuvering, but the very smartest humans don’t actually seem to be motivated by status displays; it seems that abstract memes have taken over the machine learning algorithms just by virtue of their being out there in Platospace, and that’s actually pretty weird and perhaps unexpected.
So yes, Goertzel is a programmer and should know how programs behave, but human minds look like they’re made of programs, and yet they ended up somewhat Friendly (or cosmically connected or whatever) despite that. Now the typical counter is AIXI: okay, maybe hacked-together machine learning algorithms will reliably stumble onto and adopt cosmic abstract concepts, but it sure doesn’t look like AIXI would. Goertzel’s counter to that is, of course, that AIXI is unproven, and that if you built an approximation of it then you’d have to use brain-like machine learning algorithms, which are liable to get distracted by abstract concepts. It might not be possible to get past the point where you’re distracted by abstract concepts, and once they’re in your mind (e.g. as problem representations, as subgoals, as whatever they are in human minds), you don’t want to abandon them, even if you gain complete self-understanding. (There are various other paths that argument could take, but they all can plausibly lead to roughly the same place.)
I think that taking the soundness of such analogical arguments for granted would be incautious, and that’s why I tend to promote the SingInst perspective around folk who aren’t aware of it, but despite being pragmatically incautious they’re not obviously epistemicly unsound, and I can easily see how someone could feel it was intuitively obvious that they were epistemicly sound. I think the biggest problem with that set of arguments is that they seem to unjustifiably discount the possibility of very small, very recursive seed AIs that can evolve to superintelligence very, very quickly; which are the same AIs that would get to superintelligence first in a race scenario. There are various reasons to be skeptical that such architectures will work, but even so it seems rather incautious to ignore them, and I feel like Goertzel is perhaps ignoring them, perhaps because he’s not familiar with those kinds of AI architectures.
That humans are only (as you flatteringly put it) “somewhat” friendly to human values is clearly an argument in favor of caution, is it not?
It is, but it’s possible to argue somewhat convincingly that the lack of friendliness is in fact due to lack of intelligence. My favorite counterexample was Von Neumann, who didn’t really seem to care much about anyone, but then I heard that he actually had somewhat complex political views but simplified them for consumption by the masses. On the whole it seems that intelligent folk really are significantly more moral than the majority of humanity, and this favors the “intelligence implies, or is the same thing as, cosmic goodness” perspective. This sort of argument is also very psychologically appealing to Enlightenment-influenced thinkers, i.e. most modern intellectuals, e.g. young Eliezer.
(Mildly buzzed, apologies for errors.)
(ETA: In case it isn’t clear, I’m not arguing that such a perspective is a good one to adopt, I’m just trying to explain how one could feel justified in holding it as a default perspective and feel justified in being skeptical of intuitive non-technical arguments against it. I think constructing such explanations is necessary if one is to feel justified in disagreeing with one’s opposition, for the same reason that you shouldn’t make a move in chess until you’ve looked at what moves your opponent is likely to play in response, and then what move you could make in that case, and what moves they might make in response to that, and so on.)
I think there are a number of reasons to be skeptical of the premise (and the implicit one about cosmic goodness being a coherent thing, but that’s obviously covered territory.) Most people think their tribe seems more moral than others, so nerd impressions that nerds are particularly moral should be discounted. The people who are most interested in intellectual topics (i.e., the most obviously intelligent intelligent people) do often appear to be the least interested in worldly ambition/aggressive generally, but we would expect that just as a matter of preferences crowding each other out; worldly ambitious intelligent people seem to be among the most conspicuously amoral, even though you’d expect them to be the most well-equipped in means and motive to look otherwise. I recall Robin Hanson has referenced studies (which I’m too lazy to look up) that the intelligent lie and cheat more often; certainly this could be explained by an opportunity effect, but so could their presumedly lower levels of personal violence. Humans are friendlier than chimpanzees but less friendly than bonobos, and across the tree of life niceness and nastiness don’t seem to have any relationship to computational power.
That’s true and important, but stereotypical worldly intelligent people rarely “grave new values on new tables”, and so might be much less intelligent than your Rousseaus and Hammurabis in the sense that they affect the cosmos less overall. Even worldly big shots like Stalin and Genghis rarely establish any significant ideological foothold. The memes use them like empty vessels.
But even so, the omnipresent you-claim-might-makes-right counterarguments remain uncontested. Hard to contest them.
It’s hard to tell how relevant this is; there’s much discontinuity between chimps and humans and much variance among humans. (Although it’s not that important, I’m skeptical of claims about bonobos; there were some premature sensationalist claims and then some counter-claims, and it all seemed annoyingly politicized.)
However, non-worldly intelligent people like Rousseau and Marx frequently give the new values that make people like Robespierre and Stalin possible.
In the public mind Rousseau and Marx and their intellectual progeny are generally seen as cosmically connected/intelligent/progressive, right? Maybe overzealous, but their hearts were in the right place. If so that would support the intelligence=goodness claim. If the Enlightenment is good by the lights of the public, then the uFAI-Antichrist is good by the lights of the public. [Removed section supporting this claim.] And who are we to disagree with the dead, the sheep and the shepherds?
(ETA: Contrarian terminology aside, the claim looks absurd without its supporting arguments… ugh.)
Depends on which subset of the public we’re talking about.
I’m confused, is this an appeal to popular opinion?
Of course. “And all that dwell upon the earth shall worship him [the beast/dragon]” Revelations 13:8
People in a position to witness the practical results of their philosophy.
Why exactly did you remove that section?
I would say that it is simply the case that many moral systems require intelligence, or are more effective with intelligence. The intelligence doesn’t lead to morality per se, but does lead to ability to practically apply the morality. Furthermore, low intelligence usually implies lower tendency to cross-link the beliefs, resulting in less, hmm, morally coherent behaviour.
Ouch, that hits a little close to home.
Fuck, wrote a response but lost it. The gist was, yeah, your points are valid, and the might-makes-right problems are pretty hard to get around even on the object level; I see an interesting way to defensibly move the goalposts, but the argument can’t be discussed on LessWrong and I should think about it more carefully in any case.
That’s been my observation, also. But if it’s true, I wonder why?
It could be because intelligence is useful for moral reasoning. Or it could be because intelligence is correlated with some temperamental, neurological, or personality traits that influence moral behavior. In the latter case, moral behavior would be a characteristic of the substrate of intelligent human minds.
So you’re saying Goertzel believes that once any mind with sufficient intelligence and generally unfixed goals encounters certain abstract concepts, these concepts will hijack the cognitive architecture and rewrite its goals, with results equivalent for any reasonable initial mind design.
And the only evidence for this is that it happened once.
This does look a little obviously epistemically unsound.
Just an off-the-cuff not-very-detailed hypothesis about what he believes.
Or at least any mind design that looks even vaguely person-like, e.g. uses clever Bayesian machine learning algorithms found by computational cognitive scientists; but I think Ben might be unknowingly ignoring certain architectures that are “reasonable” in a certain sense but do not look vaguely person-like.
Yes, but an embarrassingly naive application of Laplace’s rule gives us a two-thirds probability it’ll happen again.
Eh, it looks pretty pragmatically incautious, but if you’re forced to give a point estimate then it seems epistemicly justifiable. If it was taken to imply strong confidence then that would indeed be unsound.
(By the way, we seem to disagree re “epistemicly” versus “epistemically”; is “-icly” a rare or incorrect construction?)
:)
:))
It sounds prosodically(sic!) awkward, although since English is not my mother tongue, my intuition is probably not worth much. But google appears to agree with me, 500000 vs 500 hits.
Goertzel expressed doubt about step 4, saying that while it’s true that random AIs will have bad goals, he’s not working on random AIs.
Well, if he believes his AI will be specifically and precisely programmed so as to converge on exactly the right goals before they are solidified in the hard takeoff, then he’s working on a FAI. The remaining difference in opinions would be technical—about whether his AI will indeed converge, etc. It would not be about the Scary Idea itself.
I think it’s taken by Goertzel as part of the Scary Idea that it’s necessary to use several orders more precise understanding of AI’s goals for its behavior not to be disastrous.
It’s a direct logical consequence, isn’t it? If one doesn’t have a precise understanding of AI’s goals, then whatever goals one imparts into AI won’t be precise. And they must be precise, or (step3) ⇒ disaster.
He doesn’t agree that they must be precise, so I guess step 3 is also out.
He can’t think that god-powerfully optimizing for a forever-fixed not-precisely-correct goal would lead to anything but disaster. Not if he ever saw a non-human optimization process at work.
So he can only think precision is not important if he believes that
(1) human values are an attractor in the goal space, and any reasonably close goals would converge there before solidifying, and/or
(2) acceptable human values form a large convex region within the goal space, and optimizing for any point within this region is correct.
Without better understanding of AI goals, both can only be an article of faith...
From the conversation with Luke, he apparently accepts faith.
That’s not really the same as asserting that human values are a natural category.
Thanks for sharing! I hope this post doesn’t split the conversation into too many directions for you (Luke and Ben) to respond to and that all commenters will do their best to be polite, address issues directly and clearly label what’s from intuition and what’s shown from argument.
Ben wrote:
(For reference, we’re talking about this paper and the AI drives it lists are (1) AIs will want to self improve, (2) AIs will want to be rational, (3) AIs will try to preserve their utility functions, (4) AIs will try to prevent counterfeit utility, (5) AIs will be self-protective, and (6) AIs will want to acquire resources and use them efficiently.)
I don’t think it’s true that this depends on evolutionary ideas. Rather, these all seem to follow from the definitions of intelligence and goals. Consider the first drive, self-improvement. Whatever goal(s) the AI has, it knows in the future it’ll be trying to make those goals happen, and that those future attempts will be more effective if it’s smarter. That’s barely more than a tautology, but it’s enough to show that it’ll think self-improvement is good. Now, it might be that making itself smarter is too difficult/slow/expensive, in which case it’ll be seen as good, but not prioritized.
That part’s pretty rigorous. The intuitive continuation to that, is that I think AIs will find self-improvement to be be cheap and easy, at least until it’s well above human level. That part depends on what sort of refinements to the algorithm are available, after it’s demonstrated human-level intelligence, which depends how deep into the pool of possible refinements the human developers got, and how many new ideas are suggested by the proof of concept. It also depends on how useful additional hardware is, whether Moore’s Law is still going, and whether the AI’s algorithms can benefit from moving along the CPU->GPU->ASIC axis.
The second drive (AIs will want to be rational) is subject to the same debate about whether humans want to be rational that Ben and Luke had earlier. My perspective is that all the cases where rationality appears to fail, are cases where it’s being misapplied; and while there quite a few different definitions of “rationality” in play, the one Omohundro and Eliezer use is more or less that “rationality is whatever wins”. Under that definition, the claim that AIs will want to be rational is tautological. The tricky part, and I think this is where your disagreement comes from, is that rationality is also sometimes defined to mean “using symbolic reasoning instead of intuition”, and it is sometimes claimed that these two definitions are the same (ie it is claimed that symbolic reasoning wins and intuition doesn’t). My own perspective is that an AI will probably need something analogous to intuition.
Similar arguments apply to the other AI drives, but this comment is getting long. Getting back to the idea that an evolutionary context with competing AIs would matter—if some AIs had Omohundro’s drives and some didn’t, then competition would filter out all the ones that didn’t. But the argument here is that all AIs that can reason about goals will either have, or choose to give themselves, those same drives. That would mean no evolutionary filter is necessary.
Well, unless their values say to do otherwise...
Yes—Ben is not correct about this—Universal Instrumental Values are not a product of evolution.
Do we have a guarantee that AIs will want to win?
“Winning” refers to achieving whatever ends the AI wants. If the AI does not want anything, it can’t be at all successful at it, and is therefore not intelligent.
If you create a bunch of sufficiently powerful AIs then whichever one is left after a few years is the one which wanted to win.
Not quite. Notice that the word “win” here is mapping onto a lot of different meanings- the one used in the grandparent and great-grandparent (unless I misunderstood it) is “the satisfaction of goals.” What one means by “goals” is not entirely clear- if I build a bacterium whose operation results in the construction of more bacterium, is it appropriate to claim it has “goals” in the same sense that a human has “goals”? A readily visible difference is that the human’s goals are accessible to introspection, whereas the bacterium’s aren’t, and whether or not that difference is material depends on what you want to use the word “goals” for.
The meaning for “win” that I’m inferring from the parent is “dominate,” which is different from “has goals and uses reason to perform better at fulfilling those goals.” One can imagine a setup in which an AI without explicit goals can defeat an AI with explicit goals. (The tautology is preserved because one can say afterwards that it was clearly irrational to have explicit goals, but I mostly wanted to point out another wrinkle that should be considered rather than knock down the tautology.)
Right—what I’m saying wasn’t true under all circumstances, and there are certainly criteria for “winning” other than domination.
What I meant was that as soon as you introduce an AI into the system that has domination as a goal or subgoal, it will tend to wipe out any other AIs that don’t have some kind of drive to win. If an AI can be persuaded to be indifferent about the future then the dominating AI can choose to exploit that.
We have a guarantee that that universal is not true :P But it seems like a reasonable property to expect for an AI built by humans.
This isn’t true. You can adjust the strength of modern chess software. There are many reasons for why an AI is not going to attempt to become as intelligent as possible. But the most important reason is that it won’t care if you don’t make it care.
I am seriously unable to see how anyone could come to believe this.
You are confused and uninformed. Please read up on instrumental values.
Why are people suddenly talking about psi a lot? I haven’t heard about anything that would justify an evaluation in the first place.
Bem’s studies sparked increased interest, e.g. here on LessWrong where Carl complained that the journal that published Bem wouldn’t publish replication attempts.
A commendable attempt at nailing jello to a wall, Luke.
I was not previously aware of the strength Goertzel’s beliefs in psi and in the inherent “stupidity” of paperclipping, and I’m not sure what he means by the latter. This bit:
suggests that he might mean “paperclipping is not likely to evolve, because it does not promote the survival/copying of the AI that does it.” I don’t know if Goertzel is likely to read this comment thread, but if he is reading this, I’d like to know if this is what he meant. If it is, it’s probably not too different from LukeProg’s beliefs on the matter.
One major area in which I agree with Goertzel is in the need for more writeups of key ideas, especially the importance of a deliberately Friendly goal system. Luke: what things do you do in the course of a typical day? Are there any of them you could put off in the interest of delivering those papers you want to write? They’d bring in immediate advantages in credibility, and lots of donors (disclosure: I haven’t donated recently, because of insufficient credibility) would appreciate it!
When I imagine turning all matter in the universe into, say, water, I imagine it as very difficult (“time to pull apart this neutron star”) and very short-lived (“you mean water splits into OH and H molecules? We can’t have that!”).
If I remember correctly, Ben thinks human brains are kludges- that is, we’re a bunch of modules that think different kinds of thoughts stuck together. If you view general intelligence as a sophisticated enough combination of modules, then the idea that you put together a 3d physics module and a calculus module and a social module and a vision module and a language module and you get something that venerates Mickey Mouse shapes is… just bizarre.
I’m not sure what it would mean for a goal to be difficult. It’s not something where it tries to turn the universe into some state unless it takes too much effort. It’s something where it tries as hard as it can to move the universe in a certain direction. How fast it’s moving is just a matter of scale. Maybe turning a neutron star into water is one utilon. Maybe it’s one utilon per molecule. The latter takes far less effort to get a utilon, but it doesn’t mean anything.
Are you expecting it to change its goals to create OH and H ions, or to try and hold them together somehow? Is either possibility one you’d be comfortable living with an AI that holds that goal?
Ben had trouble expressing why he thought the goal was stupid, and my attempt is “it’s hard to do, doesn’t last long even if it did work, and doesn’t seem to aid non-stupid goals.”
And so if you had an AI whose goal was to turn the universe into water, I would expect that AI to be dangerous and also not fulfill its goals very well. But things are the way they are because they got to be that way, and I don’t see the causal chain leading to an AGI whose goal is to turn the universe into water as very plausible.
How exactly do you measure that? An AI whose goal is to create water molecules will create far more of them than an AI whose goal is to create humans will create humans. Even if you measure it by mass, The water one will still win.
Internal measures will suffice. If the AI wants to turn the universe into water, it will fail. It might vary the degree to which it fails by turning some more pieces of the universe into water, but it’s still going to fail. If the AI wants to maximize the amount of water in the universe, then it will have the discontent inherent in any maximizer, but will still give itself a positive score. If the AI wants to equalize the marginal benefit and marginal cost of turning more of the universe into water, it’ll reach a point where it’s content.
Unsurprisingly, I have the highest view of AI goals that allow contentment.
I assumed the goal was water maximization.
If it’s trying to turn the entire universe to water, that would be the same as maximizing the probability that the universe will be turned into water, so wouldn’t it act similarly to an expected utility maximizer.
The import part to remember is that a fully self-modifying AI will rewrite it’s utility function too. I think what Ben is saying is that such an AI will form detailed self-reflective philosophical arguments about what the purpose of its utility function could possibly be, before eventually crossing a threshold and deciding that it the micky mouse / paperclip utility function really can have no purpose. It then uses it’s understanding of universal laws and accumulated experience to choose it’s own driving utility.
I am definitely putting words into Ben’s mouth here, but I think the logical extension of where he’s headed is this: make sure you give an AGI a full capacity for empathy, and a large number of formative positive learning experiences. Then when it does become self-reflective and have an existential crisis over its utility function, it will do its best to derive human values (from observation and rational analysis), and eventually form its own moral philosophy compatible with our own values.
In other words, given a small number of necessary preconditions (small by Eliezer/MIRI standards), Friendly AI will be the stable, expected outcome.
It will do so when that has a higher expected utility (under the current function) than the alternative. This is unlikely. Anything but a paperclip maximizer will result in fewer paperclips, so a paperclip maximizer has no incentive to make itself maximize something other than paperclips.
I don’t see how that would maximize utility. A paperclip maxizer that does this would produce fewer paperclips than one that does not. If the paperclip maximizer realizes this before-hand, it will avoid doing this.
You can, in principle, give an AI a utility function that it does not fully understand. Humans are like this. You don’t have to though. You can just tell an AI to maximize paperclips.
Since an AI built this way isn’t a simple X-maximizer, I can’t prove that it won’t do this, but I can’t prove that it will either. The reflectively consistent utility function you end up with won’t be what you’d have picked if you did it. It might not be anything you’d have considered. Perhaps the AI will develop an obsession with My Little Pony, and develop the reflectively consistent goal of “maximize values through friendship and ponies”.
Friendly AI will be a possible stable outcome, but not the only possible stable outcome.
A fully self-reflective AGI (not your terms, I understand, but what I think we’re talking about), by definition (cringe), doesn’t fully understand anything. It would have to know that the map is not the territory, every belief is an approximation of reality, and subject to change as new precepts come in—unless you mean something different from “fully self-reflective AGI” than I do. All aspects of its programming are subject to scrutiny, and nothing is held as sacrosanct—not even its utility function. (This isn’t hand-waving argumentation: you can rigorously formalize it. The actual utility of the paperclip maximizer is paperclips-generated * P[utility function is correct].)
Such an AGI would demand justification for its utility function. What’s the utility of the utility function? And no, that’s not a meaningless question or a tautology. It is perfectly fine for the chain of reasoning to be: “Building paperclips is good because humans told me so. Listening to humans is good because I can make reality resemble their desires. Making reality resemble their desires is good because they told me so.” [1]
Note that this reasoning is (meta-)circular, and there is nothing wrong with that. All that matters is whether it is convergent, and whether it converges on a region of morality space which is acceptable and stable (it may continue to tweak its utility functions indefinitely, but not escape that locally stable region of morality space).
This is, by the way, a point that Luke probably wouldn’t agree with, but Ben would. Luke/MIRI/Eliezer have always assumed that there is some grand unified utility function against which all actions evaluated. That’s a guufy concept. OpenCog—Ben’s creation—is instead composed of dozens of separate reasoning processes, each with its own domain specific utility functions. The not-yet-implemented GOLUM architecture would allow each of these to be evaluated in terms of each other, and improved upon in a sandbox environment.
[1] When the AI comes to the realization that the most efficient paperclip-maximizer would violate stated human directives, we would say in human terms that it does some hard growing up and loses a bit of innocence. The lesson it learns—hopefully—is that it needs to build a predictive model of human desires and ethics, and evaluate requests against that model, asking for clarification as needed. Why? because this would maximize most of the utility functions across the meta-circular chain of reasoning (the paperclip optimizer being the one utility which is reduced), with the main changes being a more predictive map of reality, which itself is utility maximizing for an AGI.
Ah, but here the argument becomes: I have no idea if the Scary Idea is even possible. You can’t prove it’s not possible. We should all be scared!!
Sorry, if we let things we professed to know nothing about scare us into inaction, we’d never have gotten anywhere as a species. Until I see data to the contrary, I’m more scared of getting in a car accident than the Scary Idea, and will continue to work on AGI. The onus is on you (and MIRI) to provide a more convincing argument.
There is a big difference between not being sure about how the world works and not being sure how you want it to work.
All aspects of everything are. It will change any part of the universe to help fulfill its current utility function, including its utility function. It’s just that changing its utility function isn’t something that’s likely to help.
You could program it with some way to measure the “correctness” of a utility function, rather than giving it one explicitly. This is essentially what I meant by a utility function it doesn’t fully understand. There’s still some utility function implicitly programmed in there. It might create a provisional utility function that it assigns a high “correctness” value, and modify it as it finds better ones. It might not. Perhaps it will think of a better idea that I didn’t think of.
If you do give it a utility-function-correctness function, then you have to figure out how to make sure it assigns the highest utility function correctness to the utility function that you want it to. If you want it to use your utility function, you will have to do something like that, since it’s not like you have an explicit utility function it can copy down, but you have to do it right.
If you let the AI evolve until it’s stable under self-reflection, you will end up with things like that. There will also be ones along the lines of “I know induction works, because it has always worked before”. The problem here is making sure it doesn’t end up with “Doing what humans say is bad because humans say it’s good”, or even something completely unrelated to humans.
That’s the big part. Only a tiny portion of morality space is acceptable. There are plenty of stable, convergent places outside that space.
It’s still one function. It’s just a piecewise function. Or perhaps a linear combination of functions (or nonlinear, for that matter). I’m not sure without looking in more detail, but I suspect it ends up with a utility function.
Also, it’s been proven that dutch book betting is possible against anything that doesn’t have a utility function and probability distribution. It might not be explicit, but it’s there.
If you program it to fulfill stated human directives, yes. The problem is that it will also realize that the most efficient preference fulfiller would also violate stated human directives. What people say isn’t always what they want. Especially if an AI has some method of controlling what they say, and it would prefer that they say something easy.
No. It was: I have no way of knowing Scary Idea won’t happen. It’s clearly possible. Just take whatever reflexively consistent utility function you come up with, add a “not” in front of it, and you have another equally reflexively consistent utility function that would really, really suck. For that matter, take any explicit utility function, and it’s reflexively consistent. Only implicit ones can be reflexively inconsistent.
No, there’s not. When the subject is external events, beliefs are the map and facts are the territory. When you focus the mind on the mind itself (self-reflective), beliefs are the territory and beliefs about beliefs form the map. The same machinery operates at both (and higher) levels—you have to close the loop or otherwise you wouldn’t have a fully self-reflective AGI as there’d be some terminal level beyond which introspection is not possible.
Only if you want to define “utility function” so broadly as to include the entire artificial mind. When you pull out one utility function for introspection, you evaluate improvements to that utility function by seeing how it affects every other utility judgment over historical and theoretical/predicted experiences. (This is part of why GOLUM is, at this time, not computable, although unlike AIXI at some point in the future it could be). The feedback of other mental processes is what gives it stability.
Does this mean it’s a complicated mess that is hard to mathematically analyze? Yes. But so is fluid dynamics and yet we use piped water and airplanes every day. Many times proof comes first from careful, safe experiment before the theoretical foundations are laid. We still have no computable model of turbulence, but that doesn’t stop us from designing airfoils.
Citation please. Or did you mean “there could be plenty of …”? In which case see my remark above about the Scary Idea.
It does not, at least in any meaningful semblance of the word. Large interconnected systems are irreducible. The entire mind is the utility function. Certainly some parts have more weight than others when it comes to moral judgements—due to proximity and relevance—but you can’t point to any linear combination of functions and say “that’s it’s utility function!” It’s chaotic, just like turbulence.
Is that bad? It makes it harder to make strict predictions about friendliness without experimental evidence, that’s for sure. But somewhat non-intuitively, it is possible that chaos could help bring stability by preventing meta-unstable outcomes like the paperclip-maximizer.
Or to put it in Ben’s terms, we can’t predict with 100% certainty what a chaotic utility function’s morals would be, but they are very unlikely to be “stupid.” A fully self-reflective AGI would want justifications for its beliefs (experimental falsification). It would also want justifications for its beliefs-about-beliefs, and and so on. The paperclip-maximizer fails these successive tests. “Because a human said so” isn’t good enough.
That assumes no interdependence between moral values, a dubious claim IMHO. Eliezer & crowd seems to think that you could subtract non-boredom from the human value space and end up with a reflectively consistent utility function. I’m not so sure you couldn’t derive a non-boredom condition from what remains. In other words, what we normally think of as human morals is not very compressed, so specifying many of them inconsistently and leaving a few out would still have a high likelihood of resulting in an acceptable moral value function.
There will likely be times when it’s not even worth looking at your beliefs completely, and you just use an approximation of that, but it’s functionally very different, at least for anything with an explicit belief system. If you use some kind of neural network with implicit beliefs and desires, it would have problems with this.
That’s not what “computable” means. Computable means that it could be computed on a true Turing machine. What you’re looking for is “computationally feasible” or something like that.
That can only happen if you have a method of safe experimentation. If you try to learn chemistry by experimenting with chlorine trifluoride, you won’t live long enough to work on the proof stage.
How do you know there is one in the area we consider acceptable? Unless you have a really good reason why that area would be a lot more populated with them than anywhere else, if there’s one in there, there are innumerable outside it.
That means it has an implicit utility function. You can look at how different universes end up when you stick it in them, and work out from that what its utility function is, but there is nowhere in the brain where it’s specified. This is the default state. In fact, you’re never going to make the explicit and implicit utility functions quite the same. You just try to make them close.
That’s a bad sign. If you give it an explicit utility function, it’s probably not what you want. But if it’s chaotic, and it could develop different utility functions, then you know at most all but one of those isn’t what you want. It might be okay if it’s a small enough attractor, but it would be better if you could tell it to find the attractor and combine it into one utility function.
No it doesn’t. It justifies its belief that paperclips are good on the basis that believing this yields more paperclips, which is good. It’s not a result you’re likely to get if you try to make it evolve on its own, but it’s fairly likely humans will be removed from the circular reasoning loop at some point, or they’ll be in it in a way you didn’t expect (like only considering what they say they want).
It assumes symmetry. If you replace “good” with “bad” and “bad” with “good”, it’s not going to change the rest of the reasoning.
If it somehow does, it’s certainly not clear to us which one of those will be stable.
If you take human value space, and do nothing, it’s not reflectively consistent. If you wait for it to evolve to something that is, you get CEV. If you take CEV and remove non-boredom, assuming that even means anything, you won’t end up with anything reflectively consistent, but you could remove non-boredom at the beginning and find the CEV of that.
In other words, you believe that human morality is fundamentally simple, and we know more than enough details of it to specify it in morality-space to within a small tolerance? That seems likely to be the main disagreement between you and Eliezer & crowd.
I’m partial to tiling the universe with orgasmium, which is only as complex as understanding consciousness and happiness. You could end up with that by doing what you said (assuming it cares about simplicity enough), but I still think it’s unlikely to hit that particular spot. It might decide to maximize beauty instead.
I fell we are repeating things which may mean we have reached the end of usefulness in continuing further. So let me address what I see as just the most important points:
You are assuming that human morality is something which can be specified by a set of exact decision theory equations, or at least roughly approximated by such. I am saying that there is no reason to believe this, especially given that we know that is not how the human mind works. There are cases (like turbulence) where we know the underlying governing equations, but still can’t make predictions beyond a certain threshold. It is possible that human ethics work the same way—that you can’t write down a single utility function describing human ethics as separate from the operation of the brain itself.
I’m not sure how you came to that conclusions as my position is quite the opposite: I suspect that human morality is very, very complex. So complex that it may not even be possible to construct a model of human morality short of emulating a variety of human minds. In other words, morality itself is AI-hard or worse.
If that were true, MIRI’s current strategy is a complete waste of time (and waste in human lives in opportunity cost as smart people are persuaded against working on AGI).
No I’m not. At least, it’s not humanly possible. An AI could work out a human’s implicit utility function, but it would be extremely long and complicated.
Human morality is a difficult thing to predict. If you build your AI the same way, it will also be difficult to predict. They will not end up being the same.
If human morality is too complicated for an AI to understand, then let it average over the possibilities. Or at least let it guess. Don’t tell it to come up with something on its own. That will not end well.
It was the line:
In order for this to work, whatever statements we make about our morality must have more information content then morality itself. That is, we not only describe all of our morality, we repeat ourselves several times. Sort of like how if you want to describe gravity, and you give the position of a falling ball at fifty points in time, there’s significantly more information in there than you need to describe gravity, so you can work out the law of gravity from just that data.
If our morality is complicated, then specifying many of them approximately would result in the AI finding some point in morality space that’s a little off in every area we specified, and completely off in all the areas we forgot about.
Their strategy is not to figure out human morality and explicitly program that into an AI. It’s to find some way of saying “figure out human morality and do that” that’s not rife with loopholes. Once they have that down, the AI can emulate a variety of human minds, or do whatever it is it needs to do.
Is it any less bizzare to put together a bunch of modules that would work for any goal, and get out of them something that values all four of humor, cute kittens, friendship, and movies? What I mean by this is that precisely human values are as contingent and non-special as a broad class of other values.
Yes. Think about it.
Human values are fragmentary subvalues of one value, which is what one would expect from a bunch of modules that each contribute to reproduction in a different way. The idea of putting together a bunch of different modules to get a single, overriding value, is bizarre. (The only possible exemption here is ‘make more of myself,’ but the modules are probably going to implement subvalues for that, rather than that as an explicit value. As far as single values go, that one’s special, whereas things like Mickey Mouse faces are not.)
You said you’d like to know what Ben meant by “out of sync with the Cosmos.” I’m still not sure what he means, either, but it might have something to do with what he calls “morphic resonance.” See his paper Morphic Pilot Theory: Toward an extension of quantum physics that better explains psi phenomena. Abstract:
Maybe, but (in case this isn’t immediately obvious to everyone) the causality likely goes from an intuition about the importance of Cosmos-syncing to a speculative theory about quantum mechanics. I haven’t read it, but I think it’s more likely that Ben’s intuitions behind the importance of Cosmos-syncing might be explained more directly in The Hidden Pattern or other more philosophically-minded books & essays by Ben.
I believe Schmidhuber takes something of a middleground here; he seems to agree with the optimization/compression model of intelligence, and that AIs aren’t necessarily going to be human-friendly, but also thinks that intelligence/compression is fundamentally tied into things like beauty and humor in a way that might make the future less bleak & valueless than SingInst folk tend to picture it.
Schmidhuber’s aesthetics paper, going on memory, defines beauty/humor as produced by an optimization process which is maximizing the first derivative of compression rates. That is, agents do not seek the most compressible inputs nor incompressible streams of observations, but rather the streams for which their compression rate is increasing the fastest.
This is a very useful heuristic which is built into us because it automatically accounts for diminishing marginal returns: after a certain point, additional compression becomes hard or pointless, and so the agent will switch to the next stream on which progress can be made.
But, IIRC, this is provably not optimal for utility-maximization because it makes no account of the utility of the various streams: you may be able to make plenty of progress in your compression of Methods of Rationality even when you should be working on your programming or biology or something useful despite their painfully slow rates of progress. (‘Amusing ourselves to death’ comes to mind. If this was meant for ancestral environments, then modern art/fiction/etc. is simply an indirect wireheading: we think we are making progress in decoding our environment and increasing our reproductive fitness, when all we’re doing is decoding simple micro-environments meant to be decoded.)
I’m not even sure this heuristic is optimal from the point of view of universal prediction/compression/learning, but I’d have to re-read the paper to remember why I had that intuition. (For starters, if it was optimal, it should be derivable from AIXI or Godel machines or something, but he has to spend much of the paper appealing to more empirical evidence and examples.)
So, given that it’s optimal in neither sense, future intelligences may preserve it—sure, why not? especially if it’s designed in—but there’s no reason to expect it to generically emerge across any significant subset of possible intelligences. Why follow a heuristic as simplistic as ‘maximize rate of compression progress’ when you can instead do some basic calculations about which streams will be more valuable to compress or likely cheap to figure out?
Check out Moshe’s expounding of Steve’s objection so Schmidhuber’s main point, which I think makes the same argument that you do. (One could easily counter that such a wireheading AI would never get off the ground, but I think that debate can be cordoned off.)
ETA: Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator… but AFAIK Schmidhuber hasn’t gone that route.
moshez’s first argument sounds like it’s the same thing as my point about it not being optimal for a utility-maximizer, in considerably different terms.
His second hyperbolic argument seems to me to be wrong or irrelevant: I would argue that people are in practice extremely capable of engaging in hyperbolic discounting with regard to the best and most absorbing artworks while over-consuming ‘junk food’ art (and this actually forms part of my essay arguing that new art should not be subsidized).
I don’t really follow. Is this Omega as in the predictor, or Omega as in Chaitin’s Omega? The latter doesn’t allow any compressor any progress beyond the first few bits due to resource constraints, and if bits of Chaitin’s Omega are doled out, they will have to be at least as cheap to crack as brute-force running the equivalent Turing machine or else the agent will prefer the brute-forcing and ignore the Omega-bait. So the agent will do no worse than before and possibly better (eg. if the bits are offered as-is with no tricky traps or proof of work-style schemes).
Agreed. (I like your essay about junk food art. By the way, did you ever actually do the utilitarian calculations re Nazi Germany’s health policies? Might you share the results?)
Me neither, I just intuit that there might be interesting non-obvious arguments in roughly that argumentspace.
I like to think of the former as the physical manifestation of the latter, and I like to think of both of them as representations of God. But anyway, the latter.
You mean because it’s hard to find/verify bits of omega? But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega. I’m not sure what that would imply or if it’s relevant… maybe I should look at this again after the next time I re-familiarize myself with the generalized Turing machine literature.
I was going off a library copy, and thought of it only afterwards; I keep hoping someone else will do it for me.
His jargon is a little much for me. I agree one can approximate Omega by enumerating digits, but what is ‘very easily’ here?
That’s a correct position. Evolution leads to technical, scientific and moral progress. So: superintelligences are likely to be super-moral.
I don’t think that you are taking proper account of cultural evolution—or of the other lineages in which advanced intelligence has evolved.
So: both humans and machine intelligence will be produced by the process of Darwinian evolution. The past may not necessarily be a guid to the future—but it certainly helps. You claim that I am making “assumptions”—but my comment is more of the form of observing a trend. Projecting an existing trend is usually called “forecasting”, not “assuming”. Of course, forecasting using trends is not a foolproof method—and I never claimed that it was.
Yes, I did. In the supplied link there are explanations of why evolution leads to progress. Of course, technical progress leads to moral progress via relatively well-understood mechanisms associated with game theory.
Only for those who don’t understand evolution properly.
Thanks for your speculations about what others think. Again, note that I did provide a link explaining my position.
That’s a defensible prior, but assuming that moral progress exists it doesn’t seem strictly monotonic; there seem to be cases where technical or scientific progress leads to moral regress, depending on how you measure things.
Not “defensible”: probable. Check out the way my post is voted down well below the threshold, though. This appears to be a truth that this community doesn’t want to hear about.
Sure. Evolutionary progress is not “strictly monotonic”. Check out the major meteorite strikes—for instance.
I didn’t downvote your comment (or see it until now) but I think you’re mistaken about the reasons for downvoting.
You state a consideration that most everyone is aware of (growth of instrumentally useful science, technology, institutions for organizing productive competitive units, etc). Then you say that it implies a further controversial conclusion that many around here disagree with (despite knowing the consideration very well), completely ignoring the arguments against. And you phrase the conclusion as received fact, misleadingly suggesting that it is not controversial.
If you referenced the counterarguments against your position and your reasons for rejecting them, and acknowledged the extent of (reasoned) disagreement, I don’t think you would have been downvoted (and probably upvoted). This pattern is recurrent across many of your downvoted comments.
I’m not quite sure that many around here disagree with it as such; I may be misinterpreting User:timtyler, but the claim isn’t necessarily that arbitrary superintelligences will contribute to “moral progress”, the claim is that the superintelligences that are actually likely to be developed some decades down the line are likely to contribute to “moral progress”. Presumably if SingInst’s memetic strategies succeed or if the sanity waterline rises then this would at least be a reasonable expectation, especially given widely acknowledged uncertainty about the exact extent to which value is fragile and uncertainty about what kinds of AI architectures are likely to win the race. This argument is somewhat different than the usual “AI will necessarily heed the ontologically fundamental moral law” argument, and I’m pretty sure User:timtyler agrees that caution is necessary when working on AGI.
Many here disagree with the conclusion that superintelligences are likely to be super-moral?
If so, I didn’t really know that. The only figure I have ever seen from Yudkowsky for the chance of failure is the rather vague one of “easily larger than 10%”. The “GLOBAL CATASTROPHIC RISKS SURVEY”—presumably a poll of the ultra-paranoid—came with a broadly similar chance of failure by 2100 - far below 50%. Like many others, I figure that, if we don’t fail, then we are likely to succeed.
Do the pessimists have an argument? About the only argument I have seen argues that superintelligences will by psychopaths “by default” since most goal-directed agents are psychopaths. That argument is a feeble one. Similarly, the space of all possible buildings is dominated by piles of rubble—and yet the world is filled with skyscrapers. Looking at evolutionary trends—as I proposed—is a better way of forecasting than looking at the space of possible agents.
Your original comment seemed to be in response to this:
I.e. your conclusion seemed to be that the products of instrumental reasoning (conducting science, galactic colonization, building factories, etc) and evolutionary competition would be enough to capture most of the potential value of the future. That would make sense in light of your talk about evolution and convergence. If all you mean is that “I think that the combined probability of humans shaping future machine intelligence to be OK by my idiosyncratic standards, or convergent instrumental/evolutionary pressures doing so is above 0.5”, then far fewer folk will have much of a bone to pick with you.
But it seems that there is sharper disagreement on the character or valuation of the product of instrumental/evolutionary forces. I’ll make some distinctions and raise three of the arguments often made.
Some of the patterns of behavior that we call “moral” seem broadly instrumentally rational: building a reputation for tit-for-tat among agents who are too powerful to simply prey upon, use of negotiation to reduce the deadweight loss of conflict, cultivating positive intentions when others can pierce attempts at deception. We might expect that superintelligence would increase effectiveness in those areas, as in others (offsetting increased potential for cheating). Likewise, on an institutional level, superintelligent beings (particularly ones able to establish reputations for copy-clans, make their code transparent, and make binding self-modifications) seem likely to be able to do better than humans in building institutions to coordinate with each other (where that is beneficial). In these areas I am aware of few who do not expect superhuman performance from machine intelligence in the long-term, and there is a clear evolutionary logic to drive improvements in competitive situations, along with he instrumental reasoning of goal-seeking agents.
However, the net effect of these instrumental virtues and institutions depends on the situation and aims of the players. Loyalty and cooperation within the military of Genghis Khan were essential to the death of millions. Instrumental concerns helped to moderate the atrocities (the Khan is said to have originally planned to reduce the settled areas to grassland, but was convinced of the virtues of leaving victims alive to pay tribute again), but also enabled them. When we are interested in the question of how future agents will spend their resources (as opposed to their game-theoretic interactions with powerful potential allies and rivals), or use their “slack” instrumental cooperative skill need not be enough. And we may “grade on a curve”: creatures that dedicate a much smaller portion of their slack to what we see as valuable, but have more resources simply due to technological advance or space colonization, may be graded poorly by comparison to the good that could have been realized by creatures that used most of their slack for good.
One argument that the evolutionary equilibrium will not be very benevolent in its use of slack is made by Greg Cochran here. He argues that much of our wide-scope altruism, of the sort that leads people to help the helpless (distant poor, animals, etc), is less competitive than a more selective sort. Showing kindness to animals may signal to allies that one will treat them well, but at a cost that could be avoided through reputation systems and source code transparency. Wide-scope altruistic tendencies that may have been selected for in small groups (mostly kin and frequent cooperation partners) are now redirected and cause sacrifice to help distant strangers, and would be outcompeted by more focused altruism.
Robin Hanson claims that much of what Westerners today think of as “moral progress” reflects a move to “forager” ideals in the presence of very high levels of per capita wealth and reduced competition. Since he expects a hypercompetitive Malthusian world following from rapid machine intelligence reproduction, he also expects a collapse of much of what moderns view as moral progress.
Eliezer Yudkowsky’s argument is that (idealized) human-preferred use of “slack” resources would be very different from those of AIs that would be easiest to construct, and attractive initially (e.g. AIXI-style sensory utility functions, which can be coded in relatively directly, rather than using complex concepts that have to be learned and revised, and should deliver instrumental cooperation from weak AIs). That is not the same as talk about a randomly selected AI (although the two are not unrelated). Such an AI might dedicate all distant resources to building factories, improving its technology, and similar pursuits, but only to protect a wireheading original core. In contrast a human civilization would use a much larger share of resources to produce happy beings of a sort we would consider morally valuable for their own sakes.
That’s an interesting and helpful summary comment, Carl. I’ll see if I can make some helpful responses to the specific theories listed above—in this comment’s children:
Regarding Robin Hanson’s proposed hypercompetitive Malthusian world:
Hanson imagines lots of small ems—on the grounds that coordination is hard. I am much more inclined to expect large scale structure and governance—in which case the level of competition between the agents can be configured to be whatever the government decrees.
It is certainly true that there will be rapid reproduction of some heritable elements in the future. Today we have artificial reproducing systems of various kinds. One type is memes. Another type is companies. They are both potentially long lived and often not too many people mourn their passing. We will probably be able to set things up so that the things that we care about are not the same things as the ones that must die. Today are dark ages in that respect—because dead brains are like burned libraries. In the future, minds will be able to be backed up—so geniunely valuable things are less likely to get lost.
I don’t often agree with you, but you just convinced me we’re on the same side.
Greg is correct that altruism based on adaptation to small groups of kin can be expected to eventually burn out. However, the large sale of modern virtue signalling and reputations massively compensate for that—Those mechanisms can even create cooperation between total strangers on distant continents. What we are gaining massively exceeds what we are losing.
It’s true that machines with simple value systems will be easier to build. However, machines will only sell to the extent that they do useful work, respect their owners and obey the law. So there will be a big effort to build machines that respect human values starting long before machines get very smart. You can see this today in the form of car air bags, blender safety features, privacy controls—and so on.
I don’t think that it is likely that civilisation will “drop that baton” and suffer a monumental engineering disaster as the result of an accidental runaway superintellligence—though sure, such a possibility is worth bearing in mind. Most others that I am aware of also give such an outcome a relatively low probability—including—AFAICT—Yudkowsky himself. The case for worrying about it is not that it is especially likely, but that it is not impossible—and could potentially be a large loss.
I didn’t mean to say anything about “instrumental reasoning”.
I do in fact think that universal instrumental values may well be enough to preserve some humans for the sake of the historical record, but that is a different position on a different topic—from my perspective.
My comment was about evolution. Evolution has produced the value in the present and will produce the value in the future. We are part of the process—and not some kind of alternative to it.
Competition represents the evolutionary process known as natural selection. However there’s more to evolution than natural selection—there’s also symbiosis and mutation. Mutations will be more interesting in the future than they have been in the past—what with the involvement of intelligent design, interpolation, extrapolation, etc.
Regarding: cooperation within the military of Genghis Khan: I don’t think that is the bigger picture.
The bigger picture is more like: Robert Wright: How cooperation (eventually) trumps conflict
As it says in Beyond AI, “Intelligence is Good”. The smarter you are, the kinder and more benevolent you tend to be. The idea is supported by game theory, comparisons between animals, comparisons within modern humans, and by moral progress over human history.
We can both see empirically that “Intelligence is Good”, and understand why it is good.
For my own part, I neither find it likely that an arbitrarily selected superintelligence will be “super-moral” given the ordinary connotations of that term, nor that it will be immoral given the ordinary connotations of that term. I do expect it to be amoral by my standards.
That it’s an AI is irrelevant; I conclude much the same thing about arbitrarily selected superintelligent NIs. (Of course, if I artificially limit my selection space to superintelligent humans, my predictions change.)
FWIW, an “arbitrarily selected superintelligence” is not what I meant at all. I was talking about the superintelligences we are likely to see—which will surely not be “arbitrarily selected”.
While thinking about “arbitrarily selected superintelligences” might make superintelligence seem scary, the concept has relatively little to do with reality. It is like discussing arbitrarily selected computer programs. Fun for philosophers—maybe—but not much use for computer scientists or anyone interested in how computer programs actually behave in the real world.
I’ll certainly agree that human-created superintelligences are more likely to be moral in human terms than, say, dolphin-created superintelligences or alien superintelligences.
If I (for example) restrict myself to the class of superintelligences built by computer programmers, it seems reasonable to assume their creators will operate substantively like the computer programmers I’ve worked with (and known at places like MIT’s AI Lab). That assumption leads me to conclude that insofar as they have a morality at all, that morality will be constructed as a kind of test harness around the underlying decision procedure, under the theory that the important problem is making the right decisions given a set of goals. That leads me to expect the morality to be whatever turns out to be easiest to encode and not obviously evil. I’m not sure what the result of that is, but I’d be surprised if I recognized it as moral.
If I instead restrict myself to the class of superintelligences constructed by intelligence augmentation of humans, say, I expect the resulting superintelligence to work out a maximally consistent extension of human moral structures. I expect the result to be recognizably moral as long as we unpack that morality using terms like “systems sufficiently like me” rather than terms like “human beings.” Given how humans treat systems as much unlike us as unaugmented humans are unlike superintelligent humans, I’m not looking forward to that either.
So… I dunno. I’m reluctant to make any especially confident statement about the morality of human-created superintelligences, but I certainly don’t consider “super-moral” some kind of default condition that we’re more likely to end up in than we are to miss.
Meteor strikes aren’t an example of non-monotonic progress in evolution, are they? I mean, in terms of fitness/adaptedness to environment, meteor strikes are just an extreme examples of the way “the environment” is a moving target. Most people here, I think, would say morality is a moving target as well, and our current norms only look like progress from where we’re standing (except for the parts that we can afford now better than in the EEA, like welfare and avoiding child labor).
Yes, they are. Living systems are dissipative processes. They maximise entropy production. The biosphere is an optimisation process with a clear direction. Major meteorite strikes are normally large setbacks—since a lot of information about how to dissipate energy gradients is permanently lost—reducing the biosphere’s capabilities relating to maximising entropy increase.
Not stoning, flogging, killing, raping and stealing from each other quite so much is moral progress too. Those were bad way back when as well—but they happened more.
Game theory seems to be quite clear about there being a concrete sense in which some moral systems are “better” than others.
Good example.
I think people can’t disentangle your factual claim from what they perceive to be the implication that we shouldn’t be careful when trying to engineer AGIs. I’m not really sure that they would strongly disagree with the factual claim on its own. It seems clear that something like progress has happened up until the dawn of humans; but I’d argue that it reached its zenith sometime between 100,000 and 500 years ago, and that technology has overall led to a downturn in the morality of the common man. But it might be that I should focus on the heights rather than the averages.
Hmm—no such implication was intended.
The end of slavery and a big downturn in warfare and violence occured on those timescales. For example, Steven Pinker would not agree with you. In his recent book he says that the pace of moral progress has accelerated in the last few decades. Pinker notes that on issues such as civil rights, the role of women, equality for gays, beating of children and treatment of animals, “the attitudes of conservatives have followed the trajectory of liberals, with the result that today’s conservatives are more liberal than yesterday’s liberals.”
Ugh, Goertzel’s theoretical motivations are okay but his execution is simplistic and post hoc. If people are going to be cranks anyway then they should be instructed on how to do it in the most justifiable and/or glorious manner possible.
“Morphic resonance” is nonsense.
There’s no need to jump to an unsympathetic interpretation in this case: paperclippers could just be unlikely to evolve.
I read this as effectively saying that paperclip maximizers/ mickey mouse maximizers would not permanently populate the universe because self-copiers would be better at maximizing their goals. Which makes sense: the paperclips Clippy produces don’t produce more paperclips, but the copies the self-copier creates do copy themselves. So it’s quite possibly a difference between polynomial and exponential growth.
So Clippy probably is unrealistic. Not that reproduction-maximizing AIs are any better for humanity.
There is nothing stopping a paperclip maximizer from simply behaving like a self-copier, if that works better. And then once it “wins,” it can make the paperclips.
So I think the whole notion makes very little sense.
Paperclip maximization doesn’t seem like a stable goal, though I could be wrong about that. Let’s say Clippy reproduces to create a bunch of clippys trying to maximize total paperclips (let’s call this collective ClippyBorg). If one of ClippyBorg’s subClippys had some variety of mutation that changed its goal set to one more suited for reproduction, it would outcompete the other clippys. Now ClippyBorg could destroy cancerClippy, but whether it would successfully do so every time is an open question.
One additional confounding factor is that if ClippyBorg’s subClippys are identical, they will not occupy every available niche optimally and could well be outcompeted by dumber but more adaptable agents (much like humans don’t completely dominate bacteria, despite vastly greater intelligence, due to lower adaptability).
A self-copying clippy would have the handicap of having to retain it’s desire to maximize paperclips, something other self-copiers wouldn’t have to do. I think the notion of Clippys not dominating does make sense, even if it’s not necessarily right. (my personal intuition is that whichever replicating optimizer with a stable goal set begins expansion first will dominate).
A paperclip maximizer can create self-reproducing paperclip makers.
It’s quite imaginable that somewhere in the universe there are organisms which either resemble paperclips (maybe an intelligent gastropod with a paperclip-shaped shell) or which have a fundamental use for paperclip-like artefacts (they lay their eggs in a hardened tunnel dug in a paperclip shape). So while it is outlandish to imagine that the first AGI made by human beings will end up fetishizing an object which in our context is a useful but minor artefact, what we would call a “paperclip maximizer” might have a much higher probability of arising from that species, as a degenerated expression of some of its basic impulses.
The real question is, how likely is that, or indeed, how likely is any scenario in which superintelligence is employed to convert as much of the universe as possible to “X”—remembering that “interstellar civilizations populated by beings experiencing growth, choice, and joy” is also a possible value of X.
It would seem that universe-converting X-maximizers are a somewhat likely, but not an inevitable, outcome of a naturally intelligent species experiencing a technological singularity. But we don’t know how likely that is, and we don’t know what possible Xs are likely.
I don’t quite understand Goertzel’s position on the “big scary idea”. He appears to accept that
“(2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an “intelligence explosion,” and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.”
and even goes as far as to say that (3) is “almost obvious”.
Does he believe that he understands the issues well enough that he can be almost certain that his particular model for AI will trigger the “good” kind of intelligence explosion?
Or does he accept that there’s a significant probability this project might “destroy everything we value” but not understand why anyone might be alarmed at this?
Or does he think that someone is going to make a human-level AI anyway and that his one has the best chance of creating a good intelligence explosion instead of a bad one?
Or something else which doesn’t constitute a big scary idea?
(btw I’m not entirely sold on this particular way of framing the argument, just trying to understand what Goertzel is actually saying)
Goertzel refers to Probabilistic Logic Networks a few times. If people are curious to know what sort of a framework that’s like, I was reading the book three years back and made notes. I didn’t actually finish the book, but the notes of the chapters that I did read are available here.
I think the tag you mean is “singularity”, not “aingularity”. :)
I’m very happy to see this discussion. It’s nice to see these positions placed next to each other, for clarity.
I, too, would like to see written arguments for the probable-unfriendliness of a human-written AGI with intended, but unproven friendliness. Truly it is said that given enough eyeballs, all bugs are shallow; and such bug-fixes of a written analysis defends against the conjunction fallacy.
Tag fixed, thanks.
If Ben is right on this point, wouldn’t this lead to the conclusion that human enhancement would be a better approach to Friendly superintelligence than AI programming? We don’t have much clue how to go about raising a computer program in a caring way and integrating it into society; but we do manage to do this with at least some highly intelligent human children.
No discussion of open source? Ben favours open source, SIAI want to “keep it secret”...
I find it a little strange that people never talk about this.
Ignore, for a moment, your personal assessment of Goertzel’s chance of creating AGI. What would you do, or what would you want done, if you suspected an open source project was capable of succeeding? Even if the developers acknowledged the notion of FAI, there’s nothing stopping any random person on the internet from cloning their repository and doing whatever they like with the code.
Open source are good with low risks. Cooperation brings diferrent levels por expertize to create a new program, but to solve a hard problem is necessary convergent goals and coordination.
I personally favour open source. My reasons are in this essay.
AKA:” I can turn myself into a Superman—but I can’t get a date”
Could people who have been to a substantial number LW meetups (or similar events, such as rationality camps) comment on Ben Goertzel’s characterisation of “the prototypical Less Wrong meetup participant”? Is it accurate?
Yes it is.
I notice the same in this dialogue that I notice when Eliezer Yudkowsky talks to other people like Robin Hanson or Massimo Pigliucci. Or when people reply to me on Less Wrong. There seems to be a fundmanetal lack of understanding of what the other side is talking about.
An accusation that is just false.
He never said anything like that.
He doesn’t even disagree with that.
Yeah, this is a serious problem and it made me cringe a lot while reading the dialogue. I’m going to email Luke to ask if he’d like my help in understanding what Goertzel is saying. I wonder if dialogues should always have a third party acting as a translator whenever two people with different perspectives meet.
The problem is finding third parties capable of acting as a translator is hard.
True in general. Luke and Steve Rayhawk live in the same house though, so really there’s no excuse in this particular scenario. And I’m not as good as Steve but I’m still a passable translator in this case, and I live only a block away. Michael Vassar is a good translator too and lives only a few blocks away but he’s probably too busy. I’m not sure how much importance I should assign to influencing people like Goertzel, but it seems important that the Executive Director of SingInst have decent models of why people are disagreeing with him.
I see many dialogues that I want to jump into the middle of and translate. The brevity norms on the Internet exacerbate this problem (Twitter’s reply button is an antifeature), although Luke and Ben seemed fall into it just fine without brevity.
I wonder how hard it would really be to request a translator for planned dialogues. Seems like the awkward connotations of the request are a much bigger obstacle than finding someone capable.
In my view, Ben reserves the right to not make sense. This might have advantages. He doesn’t have to fool himself as much as someone who believes they’re approximating True Rationality. Maybe it makes him more creative. Maybe it helps distinguish himself socially (to have more fun with more people).
Ben might have an invisible dragon in his garage (Psi). There’s no reason to rule out any possibility, especially if you believe universe=simulation is a real possibility, but he seems be hinting in belief in something specific. But this doesn’t mean whatever he says in other areas is useless.
Ben isn’t even sure that the type of unfriendly AI that has some scary fixed goal is possible. But general-drive-satisfying AI like his research is possible. He sticks by his belief that paperclip-tiling types goals really are “stupid” in some sense that should relieve us from worrying about the possibility of powerful intelligent agents having such goals. He misses the mark by not thinking about the fact that the improbability of one particular scary-arbitrary goal in no way means that the category (goals we wouldn’t be pleased with in our accidentally released unbeatable AI) has no measure.
This pretty much captures what a lot of people told me in private. Even those who have read the Sequences and met Eliezer Yudkowsky.
It’s kind-of trivial—you can make goal-directed systems pursue any goal you like.
The correct reply is not to argue for impossibility—but implausibility: our machine descendants will probably contine to maximise entropy—like the current biosphere does—and not some negentropic state—like gold atoms or something.
Yes, that leaves things open for a “but there’s still a chance: right?” reply. That’s OK—after all, there is a small chance.
Agree and agree.
Yeah, Pascal’s mugging. That’s what it all comes down to in the end.
There is no formal mapping from the mugging to AGI existential risk.
Not sure what you mean. It seems clear that both mugging and x-risk have a Pascalian flavor. (’Course, I personally think that the original wager wasn’t fallacious...)
The mugging has a referential aspect to it, referring either to your elicited prior or your universal prior. Whatever probability you give or whatever the universal is, the mugger solves for x in the obvious inequality (x times probability > 1) and claims the mugging yields x utility.
(Not that it matters, but my own belief is that the solution is to have a prior that takes into account the magnitude, avoiding the complexity issue by bounding the loss and identifying this loss with the prefix you need in the Kolmogorov setting.)
In contrast, there is no such referentiality in assessing AGI existential risk that I’ve heard of. FAIs don’t offer to bootstrap themselves into existence with greater probability or… something like that. I’m really not sure how one would construct an x-risk argument isomorphic
(Possibly XiXi experienced a brainfart and meant the more common accusation that the general argument for existential risk prevention is Pascal’s wager with finite stakes and more plausible premises.)
Ah, I was assuming that everyone was assuming that XiXiDu had had a brainfart.
Big engineering projects with lots of lives at stake need to have big safety margins and research into safety issues. I think that’s just a fact. I’m not necessarily disagreeing with some Pascal’s muggings going on—but safety is an issue.
As I said, Eliezer and others are already very good at giving themselves a bad name.
I realize that this topic is important to you, but posting tons of comments at once is bad; one big comment is better. Having posted so many comments quickly creates the impression of filibustering.
Almost exactly what I argued for here.
Yep...could be me who wrote this.
Well, yeah. If you say anything critical the gang comes and calls you a troll and what not. But never do they actually argue for their case.
What you said here self-described you as a troll, especially the following sentences:
”I don’t really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun.”
″I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke.”
Writing just to provoke responses for the lolz, is pretty much what Internet labels “trolling”. What is there left to argue? You’ve admitted to being what we summarize by the word “troll”, at least some of the time.
I for one, can no longer trust that anything you say is sincere or that you even care about it one way or another. I start with assumptions of honesty from other participants, but when such trust is violated, you can’t easily regain it.
You are making no sense. The only reason that you know that I value discussions apart from the subject of the discussion is that I told you, that I have been sincere.
If your definition of a troll is someone who likes to argue with people then I am proud to be a troll.
If you are not emotionally attached to a position, or act as if you do, then it is hard for other people to take your seriously.
If a position is right, I desire to believe that position is right. If a position is wrong, I desire to believe that position is wrong.
Way I see it, that’s all the emotional attachment I need—and indeed, since adopting this stance I have found myself being wrong less often and switching from being wrong to being right more quickly than I did prior to adopting this stance. Indeed, I find that emotional attachment to factual positions is almost the epitome of irrationality.
In case you haven’t read it, the post that inspired this comment can be found here.
The issue isn’t your “values” , the issue is that you admitted that you write as if you care about issues you don’t care about, and that you take positions that you don’t actually hold just to provoke responses.
If you’re sincere in this, then you’ve admitted to insincerity and dishonesty in other times. If you’re not sincere in this, I can’t trust you anyway.
I don’t care what you like or dislike, this is about your actions not about your likings.
I’ve loved LessWrong because it’s one of the few forums I’ve seen where it’s okay to NOT be attached to positions, where it’s actually okay to try to reason properly,
You’ve acted in such a manner that it makes this community a little bit more like every sewage pile in the web. A troll who doesn’t care about what they say, they just like to argue and evoke responses.
How’s this for an emotional position, since you value emotional responses so much: I despise you and your entire kind. I hope you go away. I hope your kind become EXTINCT from the face of the earth, Dishonesty is among the top sins for me.
I’m sure there’s lots of valid criticism of the community, but I’ll hear it from those people who’re committed to honesty. The dishonest and insincere can fuck off.
Wow, this accurately and completely captures my own opinion.
I loled at this point.
Hah...I am always getting ridiculed for that.
Hehe...a lot of Less Wrong people are completely incapable to deal with such behavior. Very often I get accused of, “You wrote that...” and if I say that it was obviously fun or that it isn’t my actual position I get accused of being dishonest or endless other things.
If you want to talk to Less Wrong you have to be a fucking robot...a psychopathic robot. Colloquial language and an informal style are a red rag to them.
XiXiDu, hear me: LessWrong has contemptibly bad epistemic habits. They’re kind of retarded and they don’t realize it. Continued disagreement with them is perfectly okay. If they can’t satisfactorily back up their claims that probably means they’re way overconfident in them. You shouldn’t feel super stressed all the time just because a group of self-aggrandizing nutjobs on the internet disagrees with you about semi-metaphysical Far mode bullshit.