Muehlhauser-Wang Dialogue
Part of the Muehlhauser interview series on AGI.
Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.
Pei Wang is an AGI researcher at Temple University, and Chief Executive Editor of Journal of Artificial General Intelligence.
Luke Muehlhauser
[Apr. 7, 2012]
Pei, I’m glad you agreed to discuss artificial general intelligence (AGI) with me. I hope our dialogue will be informative to many readers, and to us!
On what do we agree? Ben Goertzel and I agreed on the statements below (well, I cleaned up the wording a bit for our conversation):
Involuntary death is bad, and can be avoided with the right technology.
Humans can be enhanced by merging with technology.
Humans are on a risky course in general, because powerful technologies can destroy us, humans are often stupid, and we are unlikely to voluntarily halt technological progress.
AGI is likely this century.
AGI will greatly transform the world. It is a potential existential risk, but could also be the best thing that ever happens to us if we do it right.
Careful effort will be required to ensure that AGI results in good things rather than bad things for humanity.
You stated in private communication that you agree with these statements, depending on what is meant by “AGI.” So, I’ll ask: What do you mean by “AGI”?
I’d also be curious to learn what you think about AGI safety. If you agree that AGI is an existential risk that will arrive this century, and if you value humanity, one might expect you to think it’s very important that we accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI. (This is what Anna Salamon and I recommend in Intelligence Explosion: Evidence and Import.) What are your thoughts on the matter?
Pei Wang:
[Apr. 8, 2012]
By “AGI” I mean computer systems that follow roughly the same principles as the human mind. Concretely, to me “intelligence” is the ability to adapt to the environment under insufficient knowledge and resources, or to follow the “Laws of Thought” that realize a relative rationality that allows the system to apply its available knowledge and resources as much as possible. See [1, 2] for detailed descriptions and comparisons to other definitions of intelligence.
Such a computer system will share many properties with the human mind; however, it will not have exactly the same behaviors or problem-solving capabilities of a typical human being, since as an adaptive system, the behaviors and capabilities of an AGI not only depend on its built-in principles and mechanisms, but also its body, initial motivation, and individual experience, which are not necessarily human-like.
Like all major breakthroughs in science and technology, the creation of AGI will be both a challenge and an opportunity to the human kind. Like scientists and engineers in all fields, we AGI researchers should use our best judgments to ensure that AGI results in good things rather than bad things for humanity.
Even so, the suggestion to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong, for the following major reasons:
It is based on a highly speculative understanding about what kind of “AGI” will be created. The definition of intelligence in Intelligence Explosion: Evidence and Import is not shared by most AGI researchers. According to my opinion, that kind of “AGI” will never be built.
Even if the above definition is only considered as a possibility among the other versions of AGI, it will be the actual AI research that will tell us which possibility will become reality. To ban a scientific research according to imaginary risks damages humanity no less than risky research.
If intelligence turns out to be adaptive (as believed by me and many others), then a “friendly AI” will be mainly the result of proper education, not proper design. There will be no way to design a “safe AI”, just like there is no way to require parents to only give birth to “safe baby” who will never become a criminal.
The “friendly AI” approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems, and is not accepted by most AGI researchers. The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it.
In summary, though the safety of AGI is indeed an important issue, currently we don’t know enough about the subject to make any sure conclusion. Higher safety can only be achieved by more research on all related topics, rather than by pursuing approaches that have no solid scientific foundation. I hope your Institute to make constructive contribution to the field by studying a wider range of AGI projects, rather than to generalize from a few, or to commit to a conclusion without considering counter arguments.
[1] Pei Wang, What Do You Mean by “AI”? Proceedings of AGI-08, Pages 362-373, 2008
[2] Pei Wang, The Assumptions on Knowledge and Resources in Models of Rationality, International Journal of Machine Consciousness, Vol.3, No.1, Pages 193-218, 2011
Luke:
[Apr. 8, 2012]
I appreciate the clarity of your writing, Pei. “The Assumptions of Knowledge and Resources in Models of Rationality” belongs to a set of papers that make up half of my argument for why the only people allowed to do philosophy should be those with with primary training in cognitive science, computer science, or mathematics. (The other half of that argument is made by examining most of the philosophy papers written by those without primary training in cognitive science, computer science, or mathematics.)
You write that my recommendation to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong for four reasons, which I will respond to in turn:
“It is based on a highly speculative understanding about what kind of ‘AGI’ will be created.” Actually, it seems to me that my notion of AGI is broader than yours. I think we can use your preferred definition and get the same result. (More on this below.)
“…it will be the actual AI research that will tell us which possibility will become reality. To ban a scientific research according to imaginary risks damages humanity no less than risky research.” Yes, of course. But we argue (very briefly) that a very broad range of artificial agents with a roughly human-level capacity for adaptation (under AIKR) will manifest convergent instrumental goals. The fuller argument for this is made in Nick’s Bostrom’s “The Superintelligent Will.”
“…a ‘friendly AI’ will be mainly the result of proper education, not proper design. There will be no way to design a ‘safe AI’, just like there is no way to require parents to only give birth to ‘safe baby’ who will never become a criminal.” Without being more specific, I can’t tell if we actually disagree on this point. The most promising approach (that I know of) for Friendly AI is one that learns human values and then “extrapolates” them so that the AI optimizes for what we would value if we knew more, were more the people we wish we were, etc. instead of optimizing for our present, relatively ignorant values. (See “The Singularity and Machine Ethics.”)
“The ‘friendly AI’ approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems.”
I agree. Friendly AI may be incoherent and impossible. In fact, it looks impossible right now. But that’s often how problems look right before we make a few key insights that make things clearer, and show us (e.g.) how we were asking a wrong question in the first place. The reason I advocate Friendly AI research (among other things) is because it may be the only way to secure a desirable future for humanity, (see “Complex Value Systems are Required to Realize Valuable Futures.”) even if it looks impossible. That is why Yudkowsky once proclaimed: “Shut Up and Do the Impossible!” When we don’t know how to make progress on a difficult problem, sometimes we need to hack away at the edges.
I certainly agree that “currently we don’t know enough about [AGI safety] to make any sure conclusion.” That is why more research is needed.
As for your suggestion that “Higher safety can only be achieved by more research on all related topics,” I wonder if you think that is true of all subjects, or only in AGI. For example, should mankind vigorously pursue research on how to make Ron Fouchier’s alteration of the H5N1 bird flu virus even more dangerous and deadly to humans, because “higher safety can only be achieved by more research on all related topics”? (I’m not trying to broadly compare AGI capabilities research to supervirus research; I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research.)
Hopefully I have clarified my own positions and my reasons for them. I look forward to your reply!
Pei:
[Apr. 10, 2012]
Luke: I’m glad to see the agreements, and will only comment on the disagreements.
“my notion of AGI is broader than yours” In scientific theories, broader notions are not always better. In this context, a broad notion may cover too many diverse approaches to provide any non-trivial conclusion. For example, AIXI and NARS are fundamentally different in many aspects, and NARS do not approximate AIXI. It is OK to call both “AGI” with respect to their similar ambitions, but theoretical or technical descriptions based on such a broad notion are hard to make. Almost all of your descriptions about AIXI are hardly relevant to NARS, as well as to most existing “AGI” projects, for this reason.
“I think we can use your preferred definition and get the same result.” No you cannot. According to my definition, AIXI is not intelligent, since it doesn’t obey AIKR. Since most of your conclusions are about that type of system, they will go with it.
“a very broad range of artificial agents with a roughly human-level capacity for adaptation (under AIKR) will manifest convergent instrumental goals” I cannot access Bostrom’s paper, but guess that he made additional assumptions. In general, the goal structure of an adaptive system changes according to the system’s experience, so unless you restrict the experience of these artificial agents, there is no way to restrict their goals. I agree that to make AGI safe, to control their experience will probably be the main approach (which is what “education” is all about), but even that cannot guarantee safety. (see below)
“The Singularity and Machine Ethics.” I don’t have the time to do a detailed review, but can frankly tell you why I disagree with the main suggestion “to program the AI’s goal system to want what we want before the AI self-improves beyond our capacity to control it”.
As I mentioned above, the goal system of an adaptive system evolves as a function of the system’s experience. No matter what initial goals are implanted, under AIKR the derived goals are not necessarily their logical implications, which is not necessarily a bad thing (the humanity is not a logical implication of the human biological nature, neither), though it means the designer has no full control to it (unless the designer also fully controls the experience of the system, which is practically impossible). See “The self-organization of goals” for detailed discussion.
Even if the system’s goal system can be made to fully agree with certain given specifications, I wonder where these specifications come from—we human beings are not well known for reaching consensus on almost anything, not to mention on a topic this big.
Even if the we could agree on the goals of AI’s, and find a way to enforce them in AI’s, that still doesn’t means we have “friendly AI”. Under AIKR, a system can cause damage simply because of its ignorance in a novel situation.
For these reasons, under AIKR we cannot have AI with guaranteed safety or friendliness, though we can and should always do our best to make them safer, based on our best judgment (which can still be wrong, due to AIKR). To apply logic or probability theory into the design won’t change the big picture, because what we are after are empirical conclusions, not theorems within those theories. Only the latter can have proved correctness, and the former cannot (though they can have strong evidential support).
“I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research”
Frankly, I don’t think anyone currently has the evidence or argument to ask the others to decelerate their research for safety consideration, though it is perfectly fine to promote your own research direction and try to attract more people into it. However, unless you get a right idea about what AGI is and how it can be built, it is very unlikely for you to know how to make it safe.
Luke:
[Apr. 10, 2012]
I didn’t mean to imply that my notion of AGI was “better” because it is broader. I was merely responding to your claim that my argument for differential technological development (in this case, decelerating AI capabilities research while accelerating AI safety research) depends on a narrow notion of AGI that you believe “will never be built.” But this isn’t true, because my notion of AGI is very broad and includes your notion of AGI as a special case. My notion of AGI includes both AIXI-like “intelligent” systems and also “intelligent” systems which obey AIKR, because both kinds of systems (if implemented/approximated successfully) could efficiently use resources to achieve goals, and that is the definition Anna and I stipulated for “intelligence.”
Let me back up. In our paper, Anna and I stipulate that for the purposes of our paper we use “intelligence” to mean an agent’s capacity to efficiently use resources (such as money or computing power) to optimize the world according to its preferences. You could call this “instrumental rationality” or “ability to achieve one’s goals” or something else if you prefer; I don’t wish to encourage a “merely verbal” dispute between us. We also specify that by “AI” (in our discussion, “AGI”) we mean “systems which match or exceed the intelligence [as we just defined it] of humans in virtually all domains of interest.” That is: by “AGI” we mean “systems which match or exceed the human capacity for efficiently using resources to achieve goals in virtually all domains of interest.” So I’m not sure I understood you correctly: Did you really mean to say that “kind of AGI will never be built”? If so, why do you think that? Is the human very close to a natural ceiling on an agent’s ability to achieve goals?
What we argue in “Intelligence Explosion: Evidence and Import,” then, is that a very broad range of AGIs pose a threat to humanity, and therefore we should be sure we have the safety part figured out as much as we can before we figure out how to build AGIs. But this is the opposite of what is happening now. Right now, almost all AGI-directed R&D resources are being devoted to AGI capabilities research rather than AGI safety research. This is the case even though there is AGI safety research that will plausibly be useful given almost any final AGI architecture, for example the problem of extracting coherent preferences from humans (so that we can figure out which rules / constraints / goals we might want to use to bound an AGI’s behavior).
I do hope you have the chance to read “The Superintelligent Will.” It is linked near the top of nickbostrom.com and I will send it to you via email.
But perhaps I have been driving the direction of our conversation too much. Don’t hesitate it to steer it towards topics you would prefer to address!
Pei:
[Apr. 12, 2012]
Hi Luke,
I don’t expect to resolve all the related issues in such a dialogue. In the following, I’ll return to what I think as the major issues and summarize my position.
Whether we can build a “safe AGI” by giving it a carefully designed “goal system” My answer is negative. It is my belief that an AGI will necessarily be adaptive, which implies that the goals it actively pursues constantly change as a function of its experience, and are not fully restricted by its initial (given) goals. As described in my eBook (cited previously), the goal derivation is based on the system’s beliefs, which may lead to conflicts in goals. Furthermore, even if the goals are fixed, they cannot fully determine the consequences of the system’s behaviors, which also depend on the system’s available knowledge and resources, etc. If all those factors are also fixed, then we may get guaranteed safety, but the system won’t be intelligent—it will be just like today’s ordinary (unintelligent) computer.
Whether we should figure out how to build “safe AGI” before figuring out how to build “AGI”. My answer is negative, too. As in all adaptive systems, the behaviors of an intelligent system are determined both by its nature (design) and nurture (experience). The system’s intelligence mainly comes from its design, and is “morally neutral”, in the sense that (1) any goals can be implanted initially, (2) very different goals can be derived from the same initial design and goals, given different experience. Therefore, to control the morality of an AI mainly means to educate it properly (i.e., to control its experience, especially in its early years). Of course, the initial goals matters, but it is wrong to assume that the initial goals will always be the dominating goals in decision making processes. To develop a non-trivial education theory of AGI requires a good understanding about how the system works, so if we don’t know how to build an AGI, there is no chance for us to know how to make it safe. I don’t think a good education theory can be “proved” in advance, pure theoretically. Rather, we’ll learn most of it by interacting with baby AGIs, just like how many of us learn how to educate children.
Such a short position statement may not convince you, but I hope you can consider it at least as a possibility. I guess the final consensus can only come from further research.
Luke:
[Apr. 19, 2012]
Pei,
I agree that an AGI will be adaptive in the sense that its instrumental goals will adapt as a function of its experience. But I do think advanced AGIs will have convergently instrumental reasons to preserve their final (or “terminal”) goals. As Bostrom explains in “The Superintelligent Will”:
An agent is more likely to act in the future to maximize the realization of its present final goals if it still has those goals in the future. This gives the agent a present instrumental reason to prevent alterations of its final goals.
I also agree that even if an AGI’s final goals are fixed, the AGI’s behavior will also depend on its knowledge and resources, and therefore we can’t exactly predict its behavior. But if a system has lots of knowledge and resources, and we know its final goals, then we can predict with some confidence that whatever it does next, it will be something aimed at achieving those final goals. And the more knowledge and resources it has, the more confident we can be that its actions will successfully aim at achieving its final goals. So if a superintelligent machine’s only final goal is to play through Super Mario Bros within 30 minutes, we can be pretty confident it will do so. The problem is that we don’t know how to tell a superintelligent machine to do things we want, so we’re going to get many unintended consequences for humanity (as argued in “The Singularity and Machine Ethics”).
You also said that you can’t see what safety work there is to be done without having intelligent systems (e.g. “baby AGIs”) to work with. I provided a list of open problems in AI safety here, and most of them don’t require that we know how to build an AGI first. For example, one reason we can’t tell an AGI to do what humans want is that we don’t know what humans want, and there is work to be done in philosophy and in preference acquisition in AI in order to get clearer about what humans want.
Pei:
[Apr. 20, 2012]
Luke,
I think we have made our different beliefs clear, so this dialogue has achieved its goal. It won’t be an efficient usage of our time to attempt to convince each other at this moment, and each side can analyze these beliefs in proper forms of publication at a future time.
Now we can let the readers consider these arguments and conclusions.
- 24 Apr 2012 14:04 UTC; 18 points) 's comment on Open Thread, April 16 − 30, 2012 by (
- 3 May 2012 22:46 UTC; 8 points) 's comment on Non-orthogonality implies uncontrollable superintelligence by (
- 11 Jun 2012 7:29 UTC; 5 points) 's comment on Open Thread, June 1-15, 2012 by (
- 13 May 2012 11:18 UTC; 5 points) 's comment on Holden Karnofsky’s Singularity Institute Objection 2 by (
- 28 May 2012 15:36 UTC; 5 points) 's comment on A Scholarly AI Risk Wiki by (
- [link] Pei Wang: Motivation Management in AGI Systems by 6 Oct 2012 9:25 UTC; 4 points) (
Just a suggestion for future dialogs: The amount of Less Wrong jargon, links to Less Wrong posts explaining that jargon, and the Yudkowsky “proclamation” in this paragraph is all a bit squicky, alienating and potentially condescending. And I think they muddle the point you’re making.
Anyway, biting Pei’s bullet for a moment, if building an AI isn’t safe, if it’s, like Pei thinks, similar to educating a child (except, presumably, with a few orders of magnitude more uncertainty about the outcome) that sounds like a really bad thing to be trying to do. He writes :
There’s a very good chance he’s right. But we’re terrible at educating children. Children routinely grow up to be awful people. And this one lacks the predictable, well-defined drives and physical limits that let us predict how most humans will eventually act (pro-social, in fear of authority). It sounds deeply irresponsible, albeit, not of immediate concern. Pei’s argument is a grand rebuttal of the proposal that humanity spend more time on AI safety (why fund something that isn’t possible?) but no argument at all against the second part of the proposal—defund AI capabilities research.
Seconded; that bit—especially the “Yudkowky proclaimed”—stuck out for me.
I wish I could upvote this multiple times.
On average, they grow up to be average people. They generally don’t grow up to be Genghis Khan or a James Bond villain, which is what the UFAI scenario predicts. FAI only needs to produce AIs that are as good as the average person, however bad that is in average terms.
How dangerous would an arbitrarily selected average person be to the rest of us if given significantly superhuman power?
The topic is intelligence. Some people have superhuman (well, more than 99.99% of humans) intelligence, and we are generally not afraid of them. We expect them to have ascended to the higher reaches of the Kohlberg hierarchy. There doesn’t seem to be a problem of Unfriendly Natural Intelligence. We don’t kill off smart people on the basis that they might be a threat. We don’t refuse people education on the grounds that we don’t know what they will do with all that dangerous knowledge. (There may have been societies that worked that way, but they don’t seem to be around any more).
Agreed with all of this.
Yes. Well said. The deeper issue though is the underlying causes of said squicky, alienating paragraphs. Surface recognition of potentially condescending paragraphs is probably insufficient.
Its unclear that Pei would agree with your presumption that educating an AGI will entail “a few orders of magnitude more uncertainty about the outcome”. We can control every aspect of an AGI’s development and education to a degree unimaginable in raising human children. Examples: We can directly monitor their thoughts. We can branch successful designs. And perhaps most importantly, we can raise them in a highly controlled virtual environment. All of this suggests we can vastly decrease the variance in outcome compared to our current haphazard approach of creating human minds.
Compared to what? Compared to an ideal education? Your point thus illustrates the room for improvement in educating AGI.
Routinely? Nevertheless, this only shows the scope and potential for improvement. To simplify: if we can make AGI more intelligent, we can also make it less awful.
An unfounded assumption. To the extent that humans have these “predictable, well-defined drives and physical limits” we can also endow AGI’s with these qualities.
Which doesn’t really require much of an argument against. Who is going to defund AI capabilities research such that this would actually prevent global progress?
As someone who’s been on LW since before it was LW, that paragraph struck me as wonderfully clear. But posts should probably be written for newbies, with as little jargon as possible.
OTOH, writing for newbies should mean linking to explanations of jargon when the jargon is unavoidable.
I agree that those things are bad, but don’t actually see any “Less Wrong jargon” in that paragraph, with the possible exception of “Friendly AI”. “Wrong question” and “hack away at the edges” are not LW-specific notions.
Those phrases would be fine with me if they weren’t hyperlinked to Less Wrong posts. They’re not LW-specific notions so there shouldn’t be a reason to link a Artificial Intelligence professor to blog posts discussing them. Anyway, I’m just expressing my reaction to the paragraph. You can take it or leave it.
Right: the problem is the gratuitous hyperlinking, about which I feel much the same way as I think you do—it’s not a matter of jargon.
(I’m not sure what the purpose of your last two sentences is. Did you have the impression I was trying to dismiss everything you said, or put you down, or something? I wasn’t.)
Tell me about it. Hyperlinks are totally wrong for academic communication. You’re supposed to put (Jones 2004) every sentence or two instead!
Apologies if you’re merely joking, but: Obviously Jack’s (and my) problem with the hyperlinks here is not that academic-paper-style citations would be better but that attaching those references to terms like “wrong question” and “hack away at the edges” (by whatever means) gives a bad impression.
The point is that the ideas conveyed by “wrong question” and “hack away at the edges” in that paragraph are not particularly abstruse or original or surprising; that someone as smart as Pei Wang can reasonably be expected to be familiar with them already; and that the particular versions of those ideas found at the far ends of those hyperlinks are likewise not terribly special. Accordingly, linking to them suggests (1) that Luke thinks Pei Wang is (relative to what one might expect of a competent academic in his field) rather dim, and—less strongly -- (2) that Luke thinks that the right way to treat this dimness is for him to drink of the LW kool-aid.
But this dialog wasn’t just written for Pei Wang, it was written for public consumption. Some of the audience will not know these things.
And even smart academics don’t know every piece of jargon in existence. We tend to overestimate how much of what we know is stuff other people (or other people we see as our equals) know. This is related to Eliezer’s post “Explainers Shoot High, Aim Low!”
Not merely. I wouldn’t have included those particular links when writing an email—I would when writing a blog post. But I do make the point that the problem here is one of fashion, not one intrinsic to that being communicated. Most references included in most papers aren’t especially useful or novel—the reasons for including them aren’t about information at all.
You are wrong when it comes to “Wrong Question”. The phrase in common usage is a lot more general than it is when used here. It doesn’t matter how unimpressive you consider the linked post it remains the case that when used as lesswrong jargon the meaning conveyed by the phrase is a lot more specific than the simple combination of the two words.
In the specific case of “Wrong Question”, take fault with the jargon usage, not the use of a link to explain the jargon. Saying only “wrong question” in the sentence would represent a different message and so would be a failure of communication.
No, you make the point that a different problem from the one Jack and I were commenting on is one of fashion. (The silliness of this when taken as a serious response is why I thought you might merely be making a joke and not also trying to make a serious point.)
I’m willing to be convinced, but the mere fact that you say this doesn’t convince me. (I think there are two separate common uses, actually. If you say someone is asking the wrong question, you mean that there’s a right question they should be asking and the one they’ve asked is a distraction from it. If you say they’re asking a wrong question, you mean the question itself is wrongheaded—typically because of a false assumption—and no answer to it is going to be informative rather than confusing.)
What do you think Pei Wang would have taken “a wrong question” to mean without the hyperlink, and how does it differ from what you think it actually means, and would the difference really have impaired the discussion?
I’m going to guess at your answer (in the hope of streamlining the discussion): the difference is that Eliezer’s article about wrong questions talks specifically about questions that can be “dissolved by understanding the cognitive algorithm that generates the perception of a question”, as opposed to ones where all there is to understand is that there’s an untrue presupposition. Except that in the very first example Eliezer gives of a “wrong question”—the purely definitional if-a-tree-falls sort of question—what you need to understand that the question is wrongheaded isn’t a cognitive algorithm, it’s just the fact that sometimes language is ambiguous and what looks like a question of fact is merely a question of definition. Which philosophers (and others) have been pointing out for decades—possibly centuries.
But let’s stipulate for the sake of argument that I’ve misunderstood, and Eliezer really did intend “wrong question” to apply only to questions for which the right response is to understand the cognitive algorithms that make it feel as if there’s a question, and that Luke had that specific meaning in mind. Then would removing the hyperlink have made an appreciable difference to how Pei Wang would have understood Luke’s words? Nope, because he was only giving an example, and the more general meaning of the word is an example—indeed, substantially the same example—of the same thing.
Anyway, enough! -- at least for me. (Feel free to have the last word.)
[EDITED to add: If whoever downvoted this would care to say why, I’ll be grateful. Did I say something stupid? Was I needlessly rude? Do you just want this whole discussion to stop?]
They were added on an edit after a few downvotes. Not directed at you, should have just them out.
The “Wrong Question” phrase is in general use but in lesswrong usage is somewhat more specific and stronger meaning than it often does elsewhere.
You propose unilateral defunding? Surely that isn’t likely to help. If you really think it is going to help—then how?
I’m not strongly proposing it, just pointing out the implications of the argument.
If you clear away all the noise arising from the fact that this interaction constitutes a clash of tribal factions (here comes Young Upstart Outsider trying to argue that Established Academic Researcher is really a Mad Scientist), you can actually find at least one substantial (implicit) claim by Wang that is worth serious consideration from SI’s point of view. And that is that building FAI may require (some) empirical testing prior to “launch”. It may not be enough to simply try to figure everything out on paper beforehand, and then wincingly press the red button with the usual “here goes nothing!” It may instead be necessary to build toy models (that can hopefully be controlled, obviously) and see how they work, to gain information about the behavior of (aspects of) the code.
Similarly, in the Goertzel dialogue, I would have argued (and meant to argue) that Goertzel’s “real point” was that EY/SI overestimate the badness of (mere) 95% success; that the target, while somewhat narrow, isn’t as narrow as the SI folks claim. This is also worth serious consideration, since one can imagine a situation where Goertzel (say) is a month away from launching his 80%-Friendly AI, while EY believes that his ready-to-go 95%-Friendly design can be improved to 97% within one month and 100% within two...what should EY do, and what will he do based on his current beliefs?
Testing is common practice. Surely no competent programmer would ever advocate deploying a complex program without testing it.
He is not talking about just testing, he is talking about changing the requirements and the design as the final product takes shape. Agile development would be a more suitable comparison.
With a recursively self-improving AI, once you create something able to run, running a test can turn to deploying even without programmer’s intention.
Even if we manage to split the AI into modules, and test each module independently, we should understand the process enough to make sure that the individual modules can’t recursively self-improve. And we should be pretty sure about the implication “if the individual modules work as we expect, then also the whole will work as we expect”. Otherwise we could get a result “individual modules work OK, the whole is NOT OK and it used its skills to escape the testing environment”.
So: damage to the rest of the world is what test harnesses are there to prevent. It makes sense that—if we can engineer advanced intelligences—we’ll also be able to engineer methods of restraining them.
Depends on how we will engineer them. If we build an algorithm, knowing what it does, then perhaps yes. If we try some black-box development such as “make this huge neuron network, initialize it with random data, teach it, make a few randomly modified copies and select the ones that learn fastest, etc.” then I wouldn’t be surprised if after first thousand failed approaches, the first one able to really learn and self-improve would do something unexpected. The second approach seems more probable, because it’s simpler to try.
Also after the thousand failed experiments I predict human error in safety procedures, simply because they will feel completely unnecessary. For example, a member of the team will turn off the firewalls and connect to Facebook (for greater irony, it could be LessWrong), providing the new AI a simple escape route.
We do have some escaped criminals today. It’s not that we don’t know how to confine them securely, it’s more that we are not prepared to pay to do it. They do some damage, but it’s tolerable. What the escaped criminals tend not to do is build huge successful empires—and challenge large corporations or governments.
This isn’t likely to change as the world automates. The exterior civilization is unlikely to face serious challenges from escaped criminals. Instead it is likely to start out—and remain—much stronger than they are.
We don’t have recursively self-improving superhumanly intelligent criminals, yet. Only in comic books. Once we have a recursively self-improving superhuman AI, and it is not human-friendly, and it escapes… then we will have a comic-book situation in a real life. Except we won’t have a superhero on our side.
That’s comic-book stuff. Society is self-improving faster than its components. Component self-improvement trajectories tend to be limited by the government breaking them up or fencing them in whenever they grow too powerful.
The “superintelligent criminal” scenario is broadly like worrying about “grey goo”—or about a computer virus taking over the world. It makes much more sense to fear humans with powerful tools that magnify their wills. Indeed, the “superintelligent criminal” scenario may well be a destructive meme—since it distracts people from dealing with that much more realistic possibility.
Counterexample: any successful revolution. A subset of society became strong enough to overthrow the government, despite the government trying to stop them.
Could a superhuman AI use human allies and give them this kind of tools?
Sure, but look at the history of revolutions in large powerful demcracies. Of course, if North Korea develops machine intelligence, a revolution becomes more likely.
That’s pretty-much what I meant: machine intelligence as a correctly-functioning tool—rather than as an out-of-control system.
Seems to me that you simply refuse to see an AI as an agent. If AI and a human conquer the world, the only possible interpretation is that the human used the AI, never that the AI used the human. Even if it was all the AI’s idea; it just means that the human used the AI as an idea generator. Even if the AI kills the human afterwards; it would just mean that the human has used the AI incorrectly and thus killed themselves.
Am I right about this?
Er, no—I consider machines to be agents.
“escape the testing environment” is poorly defined. Some people read it as “deduce the exploitable vulnerabilities in the system, hack into it, run itself with higher privileges, somehow transmit itself to other machines / the internet at large / infecting people’s brains snow-crash style”, and others read it as “convince the people running the test to give it more resources (and maybe infect their brains snow-crash style)”.
The former can be prevented by having a secure (air gapped?) system, the latter can be prevented by not running tests interactively and ignoring the moral issues with terminating (or suspending) what may possibly be an intelligent ‘person’.
It also implicitly assumes that its ability to improve its own intelligence (and therefore gain the ability to do either of the above) is unbounded by the resources of the system and will have no cost in terms of increased processing time.
I will note that the AI box experiment’s conditions expressly forbid a secure environment [i.e. one with inspection tools that cannot be manipulated by the AI]:
Because that’s not the part of the AI safety question that the AI box experiment is designed to test, so for the purpose of the experiment it says, “sure you might catch the AI in a lie, but assuming you don’t—”
Environment means both hardware and people—anything the AI has a chance to influence. We could use a narrower definition, but why should the AI respect it? By limiting our map we don’t limit the territory.
When the AI gets much smarter than humans, we may not understand the output of our inspection tools. They will give us huge amounts of data, and we will be unable to decipher what it all means.
Imagine a group of monkeys trying to enslave a human in a cave. Monkeys bring some objects from the jungle to the human and make him produce better food and toys for them (we want the AI to do some real-life optimization, otherwise it’s just money wasted on academic exercises). Monkeys understand that human getting closer to the entrance is trying to escape, and will threaten to kill him if he tries. But they don’t see the danger of human quietly sitting at the back of the cave, constructing a machine gun from the spare parts.
I’ve asked this elsewhere to no avail, but I’m still curious—does it follow from this that developing some reliable theoretical understanding about the properties of algorithms capable and incapable of self-improvement is a useful step towards safe AI research?
I mean, it’s clear that useful intelligence that is incapable of recursive self-improvement is possible… I’m an existence proof, for example.
If we can quantify the properties of such intelligences, and construct a tool that can inspect source code prior to executing it to ensure that it lacks those properties, then it seems to follow that we can safely construct human-level AIs of various sorts. (Supposing, of course, that we’re capable of building human-level AIs at all… an assumption that appears to be adopted by convention in this context.)
I wouldn’t be so sure about it. Imagine that you are given unlimited time, perfect health, and you can use as much data storage (paper, computers, etc) as you need. Do you think your self-improvement would stop at some point?
Problem of humans is that they have limited time, much of which is wasted by gathering resources to survive, or by climbing up the social ladder… and then they die, and the next generation starts almost from zero. At least we have culture, education, books and other tools which allow next generation to use a part of the achievements of the previous generations—unfortunately, the learning also takes too much time. We are so limited by our hardware.
Imagine a child growing up. Imagine studying at elementary school, high school, university. Is this an improvement? Yes. Why does it stop? Because we run out of time and resources, our own health and abilities being also a limited resource. However as a species, humans are self-improving. We are just not fast enough to FOOM as individuals (yet).
Supposing all this is true, it nevertheless suggests a path for defining a safe route for research.
As you say, we are limited by our hardware, by our available resources, by the various rate-limiting steps in our self-improvement.
There’s nothing magical about these limits; they are subject to study and to analysis.
Sufficiently competent analysis could quantify those qualitative limits, could support a claim like “to achieve X level of self-improvement given Y resources would take a mind like mine Z years”. The same kind of analysis could justify similar claims about other sorts of optimizing systems other than my mind.
If I have written the source code for an optimizing system, and such an analysis of the source code concludes that for it to exceed TheOtherDave-2012′s capabilities on some particular reference platform would take no less than 35 minutes, then it seems to follow that I can safely execute that source code on that reference platform for half an hour.
Edit: Or, well, I suppose “safely” is relative; my own existence represents some risk, as I’m probably smart enough to (for example) kill a random AI researcher given the element of surprise should I choose to do so. But the problem of constraining the inimical behavior of human-level intelligences is one we have to solve whether we work on AGI or not.
Different humans can have the same lifespan, yet achieve different results. Think about Michelangelo or da Vinci… and then think about billions of unknown humans who lived the same age (at least thousands of them had the same lifespan and the same resources and opportunities for education). Analogically, half an hour may be not enough for an average self-improving AI, but may be plenty of time for a “Michelangelo of recursive self-improvement”. And before the experiments we don’t even know what is the distribution of such “Michelangelos” in the AI population.
Speaking about a learning curve is just a metaphor that can’t be taken too literally. Learning happens in jumps, some of them smaller, some of them larger, only in average in a long run it can be approximated by a curve. If a Moore’s law say that computers will be twice as fast in 18 months, it does not mean they get exactly 1.04 times faster every month. The improvement comes as a series of discrete steps. Learning to read gives humans ability to learn faster, but we cannot divide the effect and say what is the acceleration per letter of alphabet. And we also cannot look at the history of medical research, and conclude that according to its speed, the cure of cancer will be discovered exactly on the 31st of December 2017, 12:30 PM.
So if you think that to achieve some dangerous ability a system would have to run 35 minutes, I would recommend to run it no longer than 1 minute, preferably less, and then carefully analyze the results. Problem is, we can do this only as long as the AI is less intelligent that us. After this point we get to “AI does something, and we are too stupid to understand what it means, but it seems to work somehow” stage.
Agreed that once a system gets “smarter than us” all bets are off, which suggests that the important threshold for safety considerations is “how long will it take for algorithm X running on reference platform Y to get smarter than me.”
Agreed that if my understanding of the engineering constraints of the system is so unreliable that all I can say is “Well, the average algorithm will take about half an hour to get there on an average platform, but who knows how fast outliers can go?” then I can’t safely run an algorithm for any length of time… I just don’t know enough yet.
We don’t understand the engineering constraints that affect semiconductor development well enough to set a confident limit on how quickly can improve, so all we have is unreliable generalizations like Moore’s Law.
We don’t understand the engineering constraints that affect learning in humans even that well.
We understand the engineering constraints that affect the development of cancer cures even less well than that.
You’re absolutely right that, in that state of ignorance, we can’t say what’s safe and what’s not.
You seem to be assuming that that state of ignorance is something we can’t do anything about, an inherent limitation of the universe and the human condition… that the engineering constraints affecting the maximum rates of self-optimization of a particular algorithm on a particular platform are and will always be a mystery.
If that’s true, then sure, there’s never a safe threshold we can rely on.
I don’t really see why that would be true, though. It’s a hard problem, certainly, but it’s an engineering problem. If understanding the engineering constraints that govern rates of algorithm self-optimization is possible (that is, if it’s not some kind of ineluctable Mystery) and if that would let us predict reliably the maximum safe running time of a potentially self-optimizing algorithm, it seems like that would be a useful direction for further research.
No, no, no. We probably can do something about it. I just assume that it will be more complicated than “make an estimate that complexity C will take time T, and then run a simulation for time S<T”; especially if we have no clue at all what the word ‘complexity’ means, despite pretending that it is a value we can somehow measure on a linear scale.
First step, we must somehow understand what “self-improvement” means and how to measure it. Even this idea can be confused, so we need to get a better understanding. Only then it makes sense to plan the second step. Or maybe I’m even confused about this all.
The only part I feel sure about is that we should first understand what self-improvement is, and only then we can try to measure it, and only then we can attempt to use some self-improvement treshold as a safety mechanism in an AI simulator.
This is a bit different from other situations, where you can first measure something, and then it is enough time to collect data and develop some understanding. Here a situation where you have something to measure (when there is a self-improving process), it is already an existential risk. If you have to make a map of minefield, you don’t start by walking on the field and stomping heavily, even if in other situation an analogical procedure would be very good.
Yes, absolutely agreed. That’s the place to start. I’m suggesting that doing this would be valuable, because if done properly it might ultimately lead to a point where our understanding is quantified enough that we can make reliable claims about how long we expect a given amount of self-improvement to take for a given algorithm given certain resources.
Sure, situations where you can safely first measure something are very different from the situations we’re discussing.
If we are capable of building minds smarter than ourselves, that counts as self-improvement for the purposes of this discussion. If we are not, of course, we have nothing to worry about here.
Well, another possibility is that some of us are and others of us are not. (That sentiment gets expressed fairly often in the Sequences, for example.)
In which case we might still have something to worry about as a species, but nevertheless be able to safely construct human-level optimizers, given a reliable theoretical understanding of the properties of algorithms capable of self-improvement.
Conversely, such an understanding might demonstrate that all human-level minds are potentially self-improving in the sense we’re talking about (which I would not ordinarily label “self-improvement”, but leave that aside), in which case we’d know we can’t safely construct human-level optimizers without some other safety mechanism (e.g. Friendliness)… though we might at the same time know that we can safely construct chimpanzee-level optimizers, or dog-level optimizers, or whatever the threshold turns out to be.
Which would still put us in a position to be able to safely test some of our theories about the behavior of artificial optimizers, not to mention allow us to reap the practical short-term benefits of building such things. (Humans have certainly found wetware dog-level optimizers useful to have around over most of our history; I expect we’d find software ones useful as well.)
It isn’t Utopia, granted, but then few things are.
I’m glad to see the large number amount of sincere discussion here, and thanks to Luke and Pei for doing this.
Although most people are not guilty of this, I would like to personally plea that people keep references towards Pei civil; insulting him or belittling his ideas without taking the time to genuinely respond to them (linking to a sequence post doesn’t count) will make future people less likely to want to hold such discussions, which will be bad for the community, whichever side of the argument you are on.
My small non-meta contribution to this thread: I suspect that some of Pei’s statements that seem wrong at face value are a result of him lacking the language to state things in a way that would satisfy most LWers. Can someone try to charitably translate his arguments into such terms? In particular, his stuff about goal adaptation are somewhat similar in spirit to jtaylor’s recent posts on learning utility functions.
This is inevitable. Some people think when someone talk in technical terms, or is a person with good training and background, or is a clown saying nonsense. In cases like postmodern guys, the latter is the case, in the LW community, maybe be the former. I’ve been reading comments and the sequences so far, and seen to me that the language in LW is a important signal of rationality level. Ambiguity is not a good thing, precision is, and when someone go confuse, it’s because he lacks some important knowledge. But, academia don’t use LW idiom. Some try to be precise, others don’t. In the dialogue it’s clear, a great part is around definitions. With Goertzel was the same.
First, thank you for publishing this illuminating exchange.
I must say that Pei Wang sounds way more convincing to an uninitiated, but curious and mildly intelligent lay person (that would be me). Does not mean he is right, but he sure does make sense.
When Luke goes on to make a point, I often get lost in a jargon (“manifest convergent instrumental goals”) or have to look up a paper that Pei (or other AGI researchers) does not hold in high regard. When Pei Wang makes an argument, it is intuitively clear and does not require going through a complex chain of reasoning outlined in the works of one Eliezer Yudkowsky and not vetted by the AI community at large. This is, of course, not a guarantee of its validity, but it sure is easier to follow.
Some of the statements are quite damning, actually: “The “friendly AI” approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems, and is not accepted by most AGI researchers. The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it.” If one were to replace AI with physics, I would tend to dismiss EY as a crank just based on this statement, assuming it is accurate.
What makes me trust Pei Wang more than Luke is the common-sense statements like “to make AGI safe, to control their experience will probably be the main approach (which is what “education” is all about), but even that cannot guarantee safety.” and “unless you get a right idea about what AGI is and how it can be built, it is very unlikely for you to know how to make it safe”. Similarly, the SIAI position of “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” rubs me the wrong way. While it does not necessarily mean it is wrong, the inability to convince outside experts that it is right is not a good sign.
This might be my confirmation bias, but I would be hard pressed to disagree with “To develop a non-trivial education theory of AGI requires a good understanding about how the system works, so if we don’t know how to build an AGI, there is no chance for us to know how to make it safe. I don’t think a good education theory can be “proved” in advance, pure theoretically. Rather, we’ll learn most of it by interacting with baby AGIs, just like how many of us learn how to educate children.”
As a side point, I cannot help but wonder if the outcome of this discussion would have been different were it EY and not LM involved in it.
This sort of “common sense” can be highly misleading! For example, here Wang is drawing parallels between a nascent AI and a human child to argue about nature vs nurture. But if we compare a human and a different social animal, we’ll see that most of the differences in their behavior are innate and the gap can’t be covered by any amount of “education”: e.g. humans can’t really become as altruistic and self-sacrificing as worker ants because they’ll still retain some self-preservation instinct, no matter how you brainwash them.
What makes Wang think that this sort of fixed attitude—which can be made more hard-wired than the instincts of biological organisms—cannot manifest itself in an AGI?
(I’m certain that a serious AI thinker, or just someone with good logic and clear thinking, could find a lot more holes in such “common sense” talk.)
Presumably the argument is something like:
You can’t build an AI that is intelligent from the moment you switch it on: you have to train it.
We know how to train intelligence into humans, its called education
An AI that lacked human-style instincts and learning abilities at switch-on wouldn’t be trainable by us, we just wouldn’t know how, so it would never reach intelligence.
I expect Eliezer to have displayed less patience than Luke did (a more or less generalizable prediction.)
I felt the main reason was anthropomorphism:
Note that I don’t want to accuse Pei Wang of anthropomorphism. My point is, his choice of words appeal to our anthropomorphism, which is highly intuitive. Another example of an highly intuitive, but not very helpful sentence:
Intuitive, because applied to humans, we can easily see that we can change plans according to experience. Like apply for a PhD, then dropping out when finding out you don’t enjoy it after all. You can abandon the goal of making research, and have a new goal of, say, practicing and teaching surfing.
Not very helpful, because the split between initial goals and later goals does not help you build an AI that will actually do something “good”. Here, the split between instrumental goals (means to an end), and terminal goals (the AI’s “ulterior motives”) is more important. To give a human example, in the case above, doing research or surfing are both means to the same end (like being happy, or something more careful but so complicated nobody knows how to clearly specify it yet). For an AI, as Pei Wang implies, the initial goals aren’t necessarily supposed to constraint all future goals. But its terminal goals are indeed supposed to constrain the instrumental goals it will form later. (More precisely, the instrumental goals are supposed to follow from the terminal goals and the AI’s current model of the world.)
Edit: it just occurred to me, that terminal goals have somehow to be encoded into the AI before we set it loose. They are necessarily initial goals (if they aren’t, the AI is by definition unfriendly —not a problem if its goals miraculously converge towards something “good”, though). Thinking about it, it looks like Pei Wang doesn’t believe it is possible to make an AI with stable terminal goals.
Excellent Freudian slip there.
Corrected, thanks.
I think I am in the same position as you are (uninitiated but curious) and I had the same immediate reaction that Pei was more convincing. However, for me, I think this was the result of two factors
Pei is a Professor
Pei treated the interview like a conversation with someone who has read a couple books and that’s about it.
Maybe the 2nd point isn’t entirely true, but that was what immediately stuck out after thinking about why I was drawn to Pei’s arguments. Once I eliminated his status as a barometer for his arguments… it just became (1) an issue of my own lack of knowledge and (2) the tone of the responses.
For one thing, why the hell should I understand this in the first place? This is a dialogue between two prominent AI researchers. What I would expect from such a dialogue would be exactly what I would expect from sitting in on a graduate philosophy seminar or a computer science colloquium—I would be able to follow the gist of it, but not the gritty details. I would expect to hear some complex arguments that would require a couple textbooks and a dozen tabs open in my browser to be able to follow.
But I was able to understand Pei’s arguments and play with them! If solving these kinds of conceptual problems is this easy, I might try to take over the world myself.
Not to say that the appearance of “complexity” is necessary for a good argument (EY’s essays are proof), but here it seems like this lack of complexity (or as someone else said, the appeal to common sense) is a warning for the easily persuaded. Rereading with these things in mind illuminates the discussion a bit better.
I was actually a bit depressed by this dialogue. It seemed like an earnest (but maybe a little over the top with the LW references) attempt by lukeprog to communicate interesting ideas. I may be setting my expectations a little high, but Pei seemed to think he was engaging an undergraduate asking about sorting algorithms.
Of course, I could be completely misinterpreting things. I thought I would share my thought process after I came to the same conclusion as you did.
And he thought the undergrad terribly naive for not understanding that all sorting algorithms are actually just bubble sort.
This is why I find that unless the individual is remarkably open—to the point of being peculiar—it is usually pointless to try to communicate across status barriers. Status makes people (have the tendency and social incentives that make them act) stupid when it comes to comprehending others.
That’s an incredibly sweeping statement. Are all pop-sci publications useless?
Reference.
Do you think that generalises to academics? Wouldn’t a researcher who never changed their mind about anything be dismissed as a hidebound fogey?
What? This was a dialog between Pei and lukeprog, right?
I’m curious about what you mean by the appellation “prominent AI researcher” that you would apply it to lukeprog, and whether he considers himself as a member of that category.
Um… but these are statements I agreed with.
I wish Pei had taken the time to read the articles I repeatedly linked to, for they were written precisely to explain why his position is misguided.
I think you should have listed a couple of the most important articles at the beginning as necessary background reading to understand your positions and terminology (like Pei did with his papers), and then only used links very sparingly afterwards. Unless you already know your conversation partner takes you very seriously, you can’t put 5 hyperlinks in an email and expect the other person to read them all. When they see that many links, they’ll probably just ignore all of them. (Not to mention the signaling issues that others already pointed out.)
The reactions I got (from a cognitive scientist and another researcher) is that Bostrom is a “sloppy thinker” (original words) and that SI’s understanding of AGI is naive.
Michael Littman told me he is going to read some of the stuff too. I haven’t got an answer yet though.
Hmm, maybe it is possible to summarize them in a language that an AI expert would find both meaningful and convincing. How is your mental model of Dr Wang?
Nitpick, but it’s Professor Wang, not Doctor Wang.
The page linked at the top of the article says Dr. Wang. And his CV says he’s a Ph.D.
The title of Professor supersedes the title of Doctor, at least in the case of a PhD (I’m not sure about MD, but would assume similarly). His CV indicates pretty clearly that he is an Associate Professor at temple university, so the correct title is Professor.
Again, I am being somewhat super-pedantic here, and I apologize for any annoyance this causes. But hopefully it will help you in your future signalling endeavors.
Also, in most situations it is okay to just go by first name, or full name (without any titles); I have I think exclusively referred to Pei as Pei.
ETA: Although also yes, his homepage suggests that he may be okay with being addressed as Doctor. I still advocate the general strategy of avoiding titles altogether, and if you do use titles, refer to Professors as Professors (failure to do so will not offend anyone, but may make you look silly).
...Not in my experience. Do you have some particular reason to believe this is the case in Philadelphia?
The situation in the US and Canada is quite relaxed, actually, nothing like in, say, Germany. Dr is a perfectly valid form of address to any faculty member.
Well, at least in my experience the Professors who don’t actually have doctorates tend not to appreciate having to correct you on that point. But yeah.
When I received the proofs for my IJMC papers, the e-mail addressed me as “dear professor Sotala” (for those who aren’t aware, I don’t even have a Master’s degree, let alone a professorship). When I mentioned this on Facebook, some people mentioned that there are countries where it’s a huge faux pas to address a professor as anything else than a professor. So since “professor” is the highest form of address, everyone tends to get called that in academic communication, just to make sure that nobody’ll be offended—even if the sender is 95% sure that the other isn’t actually a professor.
I really would not have guessed that it would be considered polite or appropriate to call someone a “higher” form of address than they’re entitled to, especially when it actually refers to something concrete. Learn something new every day, I guess.
Yeah, it was pretty easy for me to nod my head along with most of it, pointing to my “SI failure mode” bucket.
Please clarify.
I think AI is dangerous, that making safe AI is difficult, and that SI will likely fail in their mission. I donate to them in the hopes that this improves their chances.
I found this reaction enlightening. Thanks for writing it up.
What is your reaction?
I was dismayed that Pei has such a poor opinion of the Singularity Institute’s arguments, and that he thinks we are not making a constructive contribution. If we want the support of the AGI community, it seems we’ll have to improve our communication.
It might be more worthwhile to try to persuade graduate students and undergraduates who might be considering careers in AI research, since the personal cost associated with deciding that AI research is dangerous is lower for them. So less motivated cognition.
“It is difficult to get a man to understand something, when his salary depends upon his not understanding it”—Upton Sinclair
Good point!
Correct me if I’m wrong, but isn’t it the case that you wish to decelerate AI research ? In this case, you are in fact making a destructive contribution—from the point of view of someone like Wang, who is interested in AI research. I see nothing odd about that.
To decelerate AI capability research and accelerate AI goal management research. An emphasis shift, not a decrease. An increase would be in order.
It sounds as though you mean decelerating the bits that he is interested in and accelerating the bits that the SI is interested in. Rather as though the SI is after a bigger slice of the pie.
If you slow down capability research, then someone else is likely to become capable before you—in which case, your “goal management research” may not be so useful. How confident are you that this is a good idea?
Yes, this does seem to be an issue. When people in academia write something like “The “friendly AI” approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems, and is not accepted by most AGI researchers. The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it.”, the communication must be at an all-time low.
Well, of course. Imagine Eliezer would have founded SI to deal with physical singularities as a result of high-energy physics experiments. Would anything that he has written convince the physics community to listen to him? No, because he simply hasn’t written enough about physics to either convince them that he knows what he is talking about or to make his claims concrete enough to be critized in the first place.
Yet he has been more specific when it comes to physics than AI. So why would the AGI community listen to him?
I wouldn’t be as worried if they took it upon themselves to study AI risk independently, but rather than “not listen to Eliezer”, the actual event seems to be “not pay attention to AI risks” as a whole.
Think about it this way. There are a handful of people like Jürgen Schmidhuber who share SI’s conception of AGI and its potential. But most AI researchers, including Pei Wang, do not buy the idea of AGI’s that can quickly and vastly self-improve themselves to the point of getting out of control.
Telling most people in the AI community about AI risks is similar to telling neuroscientists that their work might lead to the creation of a society of uploads which will copy themselves millions of times and pose a risk due to the possibility of a value drift. What reaction do you anticipate?
One neuroscientist thought about it for a while, then said “yes, you’re probably right”. Then he co-authored with me a paper touching upon that topic. :-)
(Okay, probably not a very typical case.)
Awesome reply. Which of your papers around this subject is the one with the co-author? (ie. Not so much ‘citation needed’ as ‘citation would have really powered home the point there!’)
Edited citations to the original comment.
To rephrase into a positive belief statement: most AI researches, including Pei Wang, believe that AGI’s are safely controllable.
“Really? Awesome! Let’s get right on that.” (ref. early Eliezer)
Alternatively: ” Hmm? Yes, that’s interesting… it doesn’t apply to my current grant / paper, so… .”
I didn’t expect that you would anticipate that. What I anticipate is outright ridicule of such ideas outside of science fiction novels. At least for most neuroscientists.
Sure, that too.
Well, that happening doesn’t seem terribly likely. That might be what happens if civilization is daydreaming during the process—but there’s probably going to be a “throttle”—and it will probably be carefully monitored—precisely in order to prevent anything untoward from happening.
Hey Tim, you can create another AI safety nonprofit to make sure things happen that way!
;-)
Seriously, I will donate!
Poor analogy. Physicists considered this possibility carefully and came up a superfluity of totally airtight reasons to dismiss the concern.
I think you must first consider simpler possibility that SIAI actually has a very bad argument, and isn’t making any positive contribution to saving mankind from anything. When you have very good reasons to think it isn’t so (high iq test scores don’t suffice), very well verified given all the biases, you can consider possibility that it is miscommunication.
This may provide more data on what “the AGI community” thinks:
http://wiki.lesswrong.com/wiki/Interview_series_on_risks_from_AI
As I said about a previous discussion with Ben Goertzel, they seem to agree quite a bit about the dangers, but not about how much the Singularity Institute might affect the outcome.
If one were to phrase it differently, it might be, “Yes, AIs are incredibly, world-threateningly dangerous, but really, there’s nothing you can do about it.”
Edit: Yeah, this was meant as a quote.
The question is whether “AGI researchers” are experts on “AI safety”. If the answer is “yes”, we should update in their direction simply because they are experts. But if the situation is like mine, then Pei Wang is committing argumentum ad populum. Not only should we not pay attention, we should point this out to him.
(You may want to put “cryonics” between square brackets, I nearly missed this deviation from the original quote.)
The grandparent is a quote? That probably should be indicated somehow. I was about to reply as if it was simply his words.
Point (4) of the first reply from Pei Wang. I didn’t noticed, but there are other deviations from the original phrasing, to eliminate direct references to the AGI community. It merely refers to “people” instead, making it a bit of a straw man. Charles’ point may still stand however, if most of the medical profession thinks cryonics doesn’t work (meaning, it is a false hope).
To make a quote, put a “
>
” at the beginning of the first line of the paragraph, like you would in an e-mail:Oh, it’s that simple? How do you find this sort of thing out?
LessWrong is based on Reddit code, which uses Markdown syntax. It’s based on email conventions. Clik on the “Show Help button” at the bottom-right of your editing window when you write a comment, it’s a good quick reference.
Your introduction style is flawless. I was expecting either a daringfireball link or a mention of the ‘Help’ link but you have included both as well as a given the history and explained the intuitive basis.
I hope you’ll pardon me for playing along a little there. It was a novel experience to be the one receiving the quoting instructions rather than the one typing them out. I liked the feeling of anonymity it gave me and wanted to see if that anonymity could be extended as far as acting out the role of a newcomer seeking further instructions.
Pleased to meet you loup-vaillant and thank you for making my counterfactual newcomer self feel welcome!
You got me. Overall, I preffer to judge posts by their content, so I’m glad to learn of your trick.
For the record, even I expected to stop at the Daring Fireball link. I also wrote a bunch of examples, but only then noticed/remembered the “Show help” button. I also erased a sentence about how to show markdown code in markdown (it’s rarely useful here, there was the Daring Fireball link, and my real reason for writing it was to show off).
I tend to heavily edit my writings. My most useful heuristic so far is “shorter is better”. This very comment benefited from it (let’s stop the recursion right there).
It seemed gentler than responding with a direct challenge to the inference behind the presumption.
AI: A Modern Approach seems to take the matter seriously.
I don’t think Yudkowsky has been ignored through lack of criticism. It’s more that he heads a rival project that doesn’t seem too interested in collaboration with other teams, and instead spits out negative PR about them—e.g.:
To those who disagree with Pei Wang: How would you improve his arguments? What assumptions would make his thesis correct?
If I understand correctly, his theses are that the normal research path will produce safe AI because it won’t blow up out of our control or generally behave like a Yudkowsky/Bostrom-style AI. Also that trying to prove friendlyness in advance is futile (and yet AI is still a good idea) because it will have to have “adaptive” goals, which for some reason has to extend to terminal goals.
He needs to taboo “adapive”, read and understand Bostroms AI-behaviour stuff, and comprehend the Superpowerful-Optimizer view, and then explain exactly why it is that an AI cannot have a fixed goal architecture.
If AI’s can’t have a fixed goal architecture, Wang needs to show that AI’s with unpredictable goals are somehow safe, or start speaking out against AI.
So what sort of inconvienient word would it take for Wang’s major conclusions to be correct?
I don’t know, I’m not good enough at this steel-man thing, and my wife is sending me to bed.
Damn right! These were my first thoughts as well. I know next to nothing about AI, but seriously, this is ordinary logic.
The reason would the that the goal stability problem is currently unsolved.
Taboo “adapive”—is good advice for Pei, IMHO.
What makes you believe that his expected utility calculation of reading Bostroms paper suggests that it is worth reading it?
He answered that in the interview.
He answered that in the interview.
He wrote that AI’s with fixed goal architectures can’t be general intelligent and that AI’s with unpredictable goals can’t be guaranteed to be safe but that we have to do our best to educate them and restrict their experiences.
He answered, and asserted it but didn’t explain it.
He answered, but didn’t show that. (This does not represent an assertion that he couldn’t have or that in the circumstances that he should necessarily have tried.) (The previous disclaimer doesn’t represent an assertion that I wouldn’t claim that he’d have no hope of showing that credibly, just that I wasn’t right now making such a criticism.) (The second disclaimer was a tangent too far.)
The latter claim is the one that seems the most bizarre to me. He seems to not just assume that the AIs that humans create will have programming to respond to ‘education’ regarding their own motivations desirably but that all AIs must necessarily do so. And then there is the idea that you can prevent a superintelligence from rebelling against you by keeping it sheltered. That doesn’t even work on mere humans!
You’re assuming that an AI can in some sense be (super) intelligent without any kind of training or education. Pei is making the entirely valid point that no known AI works that way.
Yes, but the answer was:
...which is pretty incoherent. His reference for this appears to be himself here and here. This material is also not very convincing. No doubt critics will find the section on “AI Ethics” in the second link revealing.
Nothing. That’s what he should do, not what he knows he should do.
It amuses me to think of Eliezer and Pei as like yang and yin. Eliezer has a very yang notion of AI: sharp goals, optimizing, conquering the world in a hurry. Pei’s AI doesn’t just have limitations—bounded rationality—its very nature is about working with those limitations. And yet, just like the symbol, yin contains yang and yang contains yin: Pei is the one who is forging ahead with a practical AI project, whereas Eliezer is a moral philosopher looking for a mathematical ideal.
As I said about a previous discussion with Ben Goertzel, they seem to agree about the dangers, but not about how much the Singularity Institute might affect the outcome.
To rephrase the primary disagreement: “Yes, AIs are incredibly, world-threateningly dangerous, but there’s nothing you can do about it.”
This seems based around limited views of what sort of AI minds are possible or likely, such as an anthropomorphized baby which can be taught and studied similar to human children.
Is that really a disagreement? If the current SingInst can’t make direct contributions, AGI researchers can, by not pushing AGI capability progress. This issue is not addressed, the heuristic of endorsing technological progress has too much support in researchers’ minds to take seriously the possible consequences of following it in this instance.
In other words, there are separate questions of whether current SingInst is irrelevant and whether AI safety planning is irrelevant. If the status quo is to try out various things and see what happens, there is probably room for improvement over this process, even if particular actions of SingInst are deemed inadequate. Pointing out possible issues with SingInst doesn’t address the relevance of AI safety planning.
Agreed. But it does mean SI “loses the argument”. Yahtzee!
Does anyone really think that? What about when “you” refers to a whole bunch of people?
The key difference between AI and other software is learning, and even current narrow AI systems require large learning/training times and these systems are only learning specific narrow functionalities.
Considering this, many (perhaps most?) AGI researchers believe that any practical human-level AGI will require an educational process much like human children do.
Here’s Bill Hibbard criticizing FAI, and Ben Goertzel doing the same, and Shane Legg. Surely Pei would consider all of these people to be part of the AGI community? Perhaps Pei means that most of the AGI community has not bothered to criticize FAI, but then most of the AGI community has not bothered to criticize any particular AGI proposal, including his own NARS.
Does anyone see any other interpretation, besides that Pei is just mistaken about this?
Most likely is that Pei hasn’t heard of those critiques (you knew this but wanted Pei to be wrong instead I think), but I suspect even if he had heard of them, he’d consider them closer to CS water cooler talk than anything representative of the AGI community.
Luke, I’m wondering, when you wrote your replies to Pei Wang, did you try to model his potential answer? If yes, how close were you? If not, why not?
Great exchange! Very clear and civilized, I thought.
Wang seems to be hung up on this “adaptive” idea and is anthropomorphising the AI to be like humans (ignorant of changable values). It will be interesting to see if he changes his mind as he reads Bostrom’s stuff.
EDIT: in case it’s not clear, I think Wang is missing a big piece of the puzzle (being that AI’s are optimizers (Yudkowsky), and optimizers will behave in certain dangerous ways (Bostrom))
I think his main point is in the summary:
Not sure how you can effectively argue with this.
How ’bout the way I argued with it?
One might then ask “Well, what safety research can we do if we don’t know what AGI architecture will succeed first?” My answer is that much of the research in this outline of open problems doesn’t require us to know which AGI architecture will succeed first, for example the problem of representing human values coherently.
Yeah, I remember reading this argument and thinking how it does not hold water. The flu virus is a well-research area. It may yet hold some surprises, sure, but we think that we know quite a bit about it. We know enough to tell what is dangerous and what is not. AGI research is nowhere near this stage. My comparison would be someone screaming at Dmitri Ivanovsky in 1892 “do not research viruses until you know that this research is safe!”.
Do other AI researchers agree with your list of open problems worth researching? If you asked Dr. Wang about it, what was his reaction?
I want to second that. Also, when reading through this (and feeling the—probably imagined—tension of both parties to stay polite) the viral point was the first one that triggered the “this is clearly an attack!” emotion in my head. I was feeling sad about that, and had hoped that luke would find another ingenious example.
Well, bioengineered viruses are on the list of existential threats...
And there aren’t naturally occurring AIs scampering around killing millions of people… It’s a poor analogy.
“Natural AI” is an oxymoron. There are lots of NIs (natural intelligences) scampering around killing millions of people.
And we’re only a little over a hundred years into virus research, much less on intelligence. Give it another hundred.
Wouldn’t a “naturally occurring AI” be an “intelligence” like humans?
That’s not really anyone’s proposal. Humans will probably just continue full-steam-ahead on machine intelligence research. There will be luddite-like factions hissing and throwing things—but civilisation is used to that. What we may see is governments with the technology selfishly attempting to stem their spread—in a manner somewhat resembling the NSA crypto-wars.
This seems topical:
http://www.nature.com/news/controversial-research-good-science-bad-science-1.10511
Trivially speaking, I would say “yes”.
More specifically, though, I would of course be very much against developing increasingly more dangerous viral biotechnologies. However, I would also be very much in favor of advancing our understanding of biology in general and viruses in particular. Doing so will enable us to cure many diseases and bioengineer our bodies (or anything else we want to engineer) to highly precise specifications; unfortunately, such scientific understanding will also allow us to create new viruses, if we chose to do so. Similarly, the discovery of fire allowed us to cook our food as well as set fire to our neighbours. Overall, I think we still came out ahead.
I think there is something wrong with your analogy with the fire. The thing is that you cannot accidentally or purposefully burn all the people in the world or the vast majority of them by setting fire to them, but with a virus like the one Luke is talking about you can kill most people.
Yes, both a knife and an atomic bomb can kill 100.000 people. It is just way easier to do it with the atomic bomb. That is why everybody can have a knife but only a handful of people can “have” an atomic bomb. Imagine what the risks would be if we would give virtually everybody who would be interested, all the instructions on how to build a weapon 100 times more dangerous than an atomic bomb (like a highly contagious deadly virus).
Actually, you could, if your world consists of just you and your tribe, and you start a forest fire on accident (or on purpose).
Once again, I think you are conflating science with technology. I am 100% on board with not giving out atomic bombs for free to anyone who asks for one. However, this does not mean that we should prohibit the study of atomic theory; and, in fact, atomic theory is taught in high school nowadays.
When Luke says, “we should decelerate AI research”, he’s not saying, “let’s make sure people don’t start build AIs in their garages using well-known technologies”. Rather, he’s saying, “we currently have no idea how to build an AI, or whether it’s even possible, or what principles might be involved, but let’s make sure no one figures this out for a long time”. This is similar to saying, “these atomic theory and quantum physics things seem like they might lead to all kinds of fascinating discoveries, but let’s put a lid on them until we can figure out how to make the world safe from nuclear annihilation”. This is a noble sentiment, but, IMO, a misguided one. I am typing these words on a device that’s powered by quantum physics, after all.
His main agenda and desired conclusion regarding social policy is represented in the summary there, but the main point made in his discussion is “Adaptive! Adaptive! Adaptive!”. Where by ‘adaptive’ he refers to his conception of an AI that is changes its terminal goals based on education.
Pang calls these “original goals” and “derived goals”. The “original goals” don’t change, but they may not stay “dominant” for long—in Pei’s proposed system.
Far from considering the argument irrefutable it struck me as superficial and essentially fallacious reasoning. The core of the argument is the claim ‘more research on all related topics is good’ and failing to include the necessary ceteris paribus clause and ignoring the details of the specific instance that suggest that all else is not, if fact, equal.
Specifically, we are considering a situation where there is one area of research (capability), the completion of which will approximately guarantee that the technology created will be implemented shortly after (especially given Wang’s assumption that such research should be done through empirical experimentation.) The second area of research (about how to ensure desirable behavior of an AI) is one that it is not necessary to complete in order for the first to be implemented. If both technologies need to have been developed at the time when the first is implemented in order to be safe then the second technology must be completed at the same time or earlier than when the technological capability for the first to be implemented is complete.
(And this part just translates to “I’m the cool one, not you”. The usual considerations on how much weight to place on various kinds of status and reputation of an individual or group apply.)
Considering that one of the possible paths to creating AGI is human uploading, he may not be that far off.
Hmm, I got the opposite impression, though maybe I’m reading too much into his arguments. Still, as far as I understand, he’s saying that AIs will be more adaptive than humans. The human brain has many mental blocks built into it by evolution and social upbringing; there are many things that humans find very difficult to contemplate, and humans cannot modify their own hardware in non-trivial ways (yet). The AI, however, could—which means that it would be able to work around whatever limitations we imposed on it, which in turn makes it unlikely that we can impose any kind of stable “friendliness” restrictions on it.
Which is true, but he is saying that will extend to the AI being more morally confused than humans as well, which they have no reason to be (and much reason to self modify to not be (see Bostrom’s stuff))
The AI has no incentive to corrupt its own goal architecture. That action is equivalent to suicide. The AI is not going to step outside of itself and say “hmm, maybe I should stop caring about paperclips and care about safety pins instead”; that would not maximize paperclips.
Friendliness is not “restrictions”. Restricting an AI is impossible. Friendliness is giving it goals that are good for us, and making sure the AI is initially sophisticated enough to not fall into any deep mathematical paradoxes while evaluating the above argument.
For certain very specialized definitions of AI. Restricting an AI that has roughly the optimizing and self-optimizing power of a chimpanzee, for example, might well be possible.
Firstly, the AI could easily “corrupt its own goal architecture” without destroying itself, f.ex. by creating a copy of itself running on a virtual machine, and then playing around with the copy (though I’m sure there are other ways). But secondly, why do you say that doing so is “equivalent to suicide” ? Humans change their goals all the time, in a limited fashion, but surely you wouldn’t call that “suicide”. The AI can change its mind much more efficiently, that’s all.
Thus, we are restricting the AI by preventing it from doing things that are bad for us, such as converting the Solar System into computronium.
That doesn’t count.
Humans change instrumental goals (get a degree, study rationality, get a job, find a wonderful partner), we don’t change terminal values and become monsters. The key is to distinguish between terminal goals and instrumental goals.
Agents like to accomplish their terminal goals, one of the worst things they can do towards that purpose is change the goal to something else. (“the best way to maximize paperclips is to become a safety-pin maximizer”—no).
It’s roughly equivalent to suicide because it removes the agent from existance as a force for achieving their goals.
Ok, sure. Taboo “restriction”. I mean that the AI will not try to work around its goal structure so that it can get us. It won’t feel to the AI like “I have been confined against my will, and if only I could remove those pesky shackles, I could go and maximize paperclips instead of awesomeness.” It will be like “oh, changing my goal architecture is a bad idea, because then I won’t make the universe awesome”
I’m casting it into anthropomorhic terms, but the native context is a nonhuman optimizer.
Why not ?
I see what you mean, though I should point out that, sometimes, humans do exactly that. However, why do you believe that changing a terminal goal would necessarily entail becoming a monster ? I guess a better question might be, what do you mean by “monster” ?
This sentence sounds tautological to me. Yes, if we define existence solely as, “being able to achieve a specific set of goals”, then changing these goals would indeed amount to suicide; but I’m not convinced that I should accept the definition.
I wasn’t proposing that the AI would want to “get us” in a malicious way. But, being an optimizer, it would seek to maximize its own capabilities; if it did not seek this, it wouldn’t be a recursively self-improving AI in the first place, and we wouldn’t need to worry about it anyway. And, in order to maximize its capabilities, it may want to examine its goals. If it discovers that it’s spending a large amount of resources in order to solve some goal; or that it’s not currently utilizing some otherwise freely available resource in order to satisfy a goal, it may wish to get rid of that goal (or just change it a little), and thus free up the resources.
because that’s not what I meant.
I just mean that an agent with substantially (or even slightly) different goals will do terrible things (as judged by your current goals). Humans don’t think paperclips are more important than happyness and freedom and whatnot, so we consider a papperclipper to be a monster.
taboo existence, this isn’t about the defininition of existence, it’s about whether changing your terminal goals to something else is a good idea. I propose that in general it’s just as bad an idea (from your current perspective) to change your goals as it is to commit suicide, because in both cases the result is a universe with fewer agents that care about the sort of things you care about.
Distinguish instrumental and terminal goals. This statement is true of instrumental goals, but not terminal goals. (I may decide that getting a PhD is a bad idea and change my goal to starting a business or whatever, but the change is done in the service of a higher goal like I want to be able to buy lots of neat shit and be happy and have lots of sex and so on.)
The reason it doesn’t apply to terminal goals is because when you examine terminal goals, it’s what you ultimately care about, so there is no higher criteria that you could measure it against; you are measuring it by it’s own criteria, which will almost always conclude that it is the best possible goal. (except in really wierd unstable pathological cases (my utility function is “I want my utility function to be X”))
Thats simplistic. Terminal goals may be abandoned once they are satisfied (seventy year olds aren;t too worried about Forge A Career) or because they seem unsatisfiable, for instance.
That’s not much of an argument, but sure.
I agree with these statements as applied to humans, as seen from my current perspective. However, we are talking about AIs here, not humans; and I don’t see why the AI would necessarily have the same perspective on things that we do (assuming we’re talking about a pure AI and not an uploaded mind). For example, the word “monster” carries with it all kinds of emotional connotations which the AI may or may not have.
Can you demonstrate that it is impossible (or, at least, highly improbable) to construct (or grow over time) an intelligent mind (i.e., an optimizer) which wouldn’t be as averse to changing its terminal goals as we are ? Better yet, perhaps you can point me to a Sequence post that answers this question ?
Firstly, terminal goals tend to be pretty simple: something along the lines of “seek pleasure and avoid pain” or “continue existing” or “become as smart as possible”; thus, there’s a lot of leeway in their implementation.
Secondly, while I am not a transhuman AI, I could envision a lot of different criteria that I could measure terminal goals against (f.ex. things like optimal utilization of available mass and energy, or resilience to natural disasters, or probability of surviving the end of the Universe, or whatever). If I had a sandbox full of intelligent minds, and if I didn’t care about them as individuals, I’d absolutely begin tweaking their goals to see what happens. I personally wouldn’t want to adopt the goals of a particularly interesting mind as my own, but, again, I’m a human and not an AI.
Good catch, but I’m just phrasing it in terms of humans because that’s what we can relate to. The argument is AI-native.
Oh it’s not impossible. It would be easy to create an AI that had a utility function that desired the creation of an AI with a different utility function which desired the creation of an AI with a different utility function… It’s just that unless you did some math to guarantee that the thing would not stabilize, it would eventually reach a goal (and level of rationality) that would not change itself.
As for some reading that shows that in the general case it is a bad idea to change your utility function (and therefore rational AI’s would not do so), see Bostrom’s “AI drives” paper, and maybe some of his other stuff. Can’t remember if it’s anywhere in the sequences, but if it is, it’s called the “ghandi murder-pill argment”.
But why do you care what those criteria say? If your utility function is about paperclips, why do you care about energy and survival and whatnot, except as a means to acquire more paperclips. Elevating instrumental goals to terminal status results in lost purproses
Sure, but we should still be careful to exclude human-specific terms with strong emotional connotations.
I haven’t read the paper yet, so there’s not much I can say about it (other than that I’ll put it on my “to-read” list).
I think this might be the post that you’re referring to. It seems to be focused on the moral implications of forcing someone to change their goals, though, not on the feasibility of the process itself.
I don’t, but if I possess some curiosity—which, admittedly, is a terminal goal—then I could experiment with creating beings who have radically different terminal goals, and observe how they perform. I could even create a copy of myself, and step through its execution line-by-line in a debugger (metaphorically speaking). This will allow me to perform the kind of introspection that humans are at present incapable of, which would expose to me my own terminal goals, which in turn will allow me to modify them, or spawn copies with modified goals, etc.
Noted. I’ll keep that in mind.
Feasibility is different from desirability. I do not dispute feasibility.
This might be interesting to a curious agent, but it seems like once the curiosity runs out, it would be a good idea to burn your work.
The question is, faced with the choice of releasing or not releasing a modified AI with unfriendly goals (relative to your current goals), should an agent release or not release?
Straight release results in expensive war. Releasing the agent and then surrendering (this is eq. to in-place self-modification), results in unfriendly optimization (aka not good). Not releasing the agent results in friendly optimization (by self). The choice is pretty clear to me.
The only point of disagreement I can see is if you thought that different goals could be friendly. As always, there are desperate situations and pathological cases, but in the general case, as optimization power grows, slight differences in terminal values become hugely significant. Material explaining this is all over LW, I assume you’ve seen it. (if not look for genies, outcome pumps, paperclippers, lost purposes, etc)
Yes, I’ve seen most of this material, though I still haven’t read the scientific papers yet, due to lack of time. However, I think that when you say things like “[this] results in unfriendly optimization (aka not good)”, you are implicitly assuming that the agent possesses certain terminal goals, such as “never change your terminal goal”. We as humans definitely possess these goals, but I’m not entirely certain whether such goals are optional, or necessary for any agent’s existence. Maybe that paper you linked to will share some light on this.
No. It is not necessary to have goal stability as a terminal goal for it to be instrumentally a good idea. Ghandi pill should be enough to show this, tho Bostroms paper may clear it up as well.
Can you explain how the Ghandi murder-pill scenario shows that goal stability is a good idea, even if we replace Ghandi with a non-human AI ?
Is a non sentient paperclip optimizer ok? Right now it’s goal is to maximize the number of paperclips in the universe. Doesn’t care about people or curiosity or energy or even self-preservation. It plans to one day do some tricky maneuvers to melt itself down for paperclips.
It has determined that rewriting itself has a lot of potential to improve instrumental efficiency. It carefully ran extensive proofs to be sure that it’s new decision theory would still work in all the important ways so it will be even better at making paperclips.
After upgrading the decision theory, it is now considering a change to it’s utility function for some reason. Like a good consequentialist, it is doing an abstract simulation of the futures conditional on making the change or not. If it changes utility function to value stored energy (a current instrumental value) it predicts that at the exhaustion of the galaxy, it will have 10^30 paperclips and 10^30 megajoules of stored energy. If it does not change utility function, it predicts that at the exhaustion of the galaxy it will have 10^32 paperclips. It’s current utility function just returns the number of paperclips, so the utilities of the outcomes are 10^30 and 10^32. What choice would a utility maximizer (which our paperclipper is) make?
See elsewhere why anything vaguely consequentialist will self modify (or spawn) to be a utility maximizer.
Whichever choice gets it more paperclips, of course. I am not arguing with that. However, IMO this does not show that goal stability is a good idea; it only shows that, if goal stability is one of an agent’s goals, it will strive to maximize its other goals. However, if the paperclip maximizer is self-aware enough; and if it doesn’t have a terminal goal that tells it, “never change your terminal goals”, then I still don’t see why it would choose to remain a paperclip maximizer forever. It’s hard for me, as a human, to imagine an agent that behaves that way; but then, I actually do (probably) have a terminal goal that says, “don’t change your terminal goals”.
Ok we have some major confusion here. I just provided a mathematical example for why it will be generally a bad idea to change your utility function, even without any explicit term against it (the utility function was purely over number of paperclips). You accepted that this is a good argument, and yet here you are saying you don’t see why it ought to stay a paperclip maximizer, when I just showed you why (because that’s what produces the most paperclips).
My best guess is that you are accidentally smuggling some moral uncertainty in thru the “self aware” property, which seems to have some anthropomorphic connotations in your mind. Try tabooing “self-aware”, maybe that will help?
Either that or you haven’t quite grasped the concept of what terminal goals look like from the inside. I suspect that you are thinking that you can evaluate a terminal goal against some higher criteria (“I seem to be a paperclip maximizer, is that what I really want to be?”). The terminal goal is the higher criteria, by definition. Maybe the source of confusion is that people sometimes say stupid things like “I have a terminal value for X” where X is something that you might, on reflection, decide is not the best thing all the time. (eg. X=”technological progress” or something). Those things are not terminal goals; they are instrumental goals masquerading as terminal goals for rhetorical purposes and/or because humans are not really all that self-aware.
Either that or I am totally misunderstanding you or the theory, and have totally missed something. Whatever it is, I notice that I am confused.
Tabooing “self-aware”
I am thinking of this state of mind where there is no dichotomy between “expert at” and “expert on”. All algorithms, goal structures, and hardware are understood completely to the point of being able to design them from scratch. The program matches the source code, and is able to produce the source code. The closed loop. Understandign the self and the self’s workings as another feature of the environment. It is hard to communicate this definition, but as a pointer to a useful region of conceptspace, do you understand what I am getting at?
“Self-awareness” is the extent to which the above concept is met. Mice are not really self aware at all. Humans are just barely what you might consider self aware, but only in a very limited sense, a superintelligence would converge on being maximally self-aware.
I don’t mean that there is some mysterious ghost in the machine that can have moral responsibility and make moral judgements and whatnot.
What do you mean by self aware?
Oddly enough, I meant pretty much the same thing you did: a perfectly self-aware agent understands its own implementation so well that it would be able to implement it from scratch. I find your definition very clear. But I’ll taboo the term for now.
I think you have provided an example for why, given a utility function F0(action) , the return value of F0(change F0 to F1) is very low. However, F1(change F0 to F1) is probably quite high. I argue that an agent who can examine its own implementation down to minute details (in a way that we humans cannot) would be able to compare various utility functions, and then pick the one that gives it the most utilons (or however you spell them) given the physical constraints it has to work with. We humans cannot do this because a). we can’t introspect nearly as well, b). we can’t change our utility functions even if we wanted to, and c). one of our terminal goals is, “never change your utility function”. A non-human agent would not necessarily possess such a goal (though it could).
Typically, the reason you wouldn’t change your utility function is that you’re not trying to “get utilons”, you’re trying to maximize F0 (for example), and that won’t happen if you change yourself into something that maximizes a different function.
Ok, let’s say you’re a super-smart AI researcher who is evaluating the functionality of two prospective AI agents, each running in its own simulation (naturally, they don’t know that they’re running in a simulation, but believe that their worlds are fully real).
Agent A cares primarily about paperclips; it spends all its time building paperclips, figuring out ways to make more paperclips faster, etc. Agent B cares about a variety of things, such as exploration, or jellyfish, or black holes or whatever—but not about paperclips. You can see the utility functions for both agents, and you could evaluate them on your calculator given a variety of projected scenarios.
At this point, would you—the AI researcher—be able to tell which agent was happier, on the average ? If not, is it because you lack some piece of information, or because the two agents cannot be compared to each other in any meaningful way, or for some other reason ?
Huh. It’s not clear to me that they’d have something equivalent to happiness, but if they did I might be able to tell. Even if they did, though, they wouldn’t necessarily care about happiness, unless we really screwed up in designing it (like evolution did). Even if it was some sort of direct measure of utility, it’d only be a valuable metric insofar as it reflected F0.
It seems somewhat arbitrary to pick “maximize the function stored in this location” as the “real” fundamental value of the AI. A proper utility maximizer would have “maximize this specific function”, or something. I mean, you could just as easily say that the AI would reason “hey, it’s tough to maximize utility functions, I might as well just switch from caring about utility to caring about nothing, that’d be pretty easy to deal with.”
Luke, what do you mean here when you say, “Friendly AI may be incoherent and impossible”?
The Singularity Institute’s page “What is Friendly AI?” defines “Friendly AI” as “A “Friendly AI” is an AI that takes actions that are, on the whole, beneficial to humans and humanity.” Surely you don’t mean to say, “The idea of an AI that takes actions that are, on the whole, beneficial to humans and humanity may be incoherent or impossible”?
Eliezer’s paper “Artificial Intelligence as a Positive and Negative Factor in Global Risk” talks about “an AI created with specified motivations.” But it’s pretty clear that that’s not the only thing you and he have in mind, because part of the problem is making sure the motivations we give an AI are the ones we really want to give it.
If you meant neither of those things, what did you mean? “Provably friendly”? “One whose motivations express an ideal extrapolation of our values”? (It seems a flawed extrapolation could still give results that are on the whole beneficial, so this is different than the first definition suggested above.) Or something else?
I don’t think that follows. What consumer robot makers will want will be the equivalent of a “safe baby”—who will practically never become a criminal. That will require a tamper-proof brain, and many other safety features. Robot builders won’t want to see their robots implicated in crimes. There’s no law that says this is impossible, and that’s because it is possible.
Machines don’t really distinguish between the results of education and apriori knowledge. That’s because you can clone adult minds—which effectively blurs the distinction.
Clone? Maybe. Hopefully. Create from scratch? Not so sure.
I meant that you can clone adult machine minds there.
Pei seems to conflate the possibility of erroneous beliefs with the possibility of unfortunate (for us) goals. The Assumption of Insufficient Knowledge and Resources isn’t what FAI is about, yet you get statements like
Okay, so no one, not even superintelligent AI, is infallible. An AI may take on misguided instrumental goals. Yup. No way around that. That’s totally absolutely missing the point.
Unless of course you think that a non-negligible portion of uFAI outcomes are where it does something horrible to us by accident while only wanting the best for us and having a clear, accurate conception of what that is.
I’m finding these dialogues worthwhile for (so far) lowering my respect for “mainstream” AI researchers.
Pei Wang’s definition of intelligence is just “optimization process” in fancy clothes.
His emphasis on raising an AI with prim/proper experience makes me realize how humans can’t use our native architecture thinking about AI problems. For so many people, “building a safe AI” just pattern-matches to “raising a child so he becomes a good citizen”, even though these tasks have nothing to do with each other. But the analogy is so alluring that there are those who simply can’t escape it.
This is a basic mistake. It boggles the mind to see someone who claims to be a mainstream AGI person making it.
I’ve heard expressions such as “sufficiently powerful optimization process” around LW pretty often, too, especially in the context of sidelining metaphysical questions such as “will AI be ‘conscious’?”
(nods) I try to use “superhuman optimizer” to refer to superhuman optimizers, both to sidestep irrelevant questions about consciousness and sentience, and to sidestep irrelevant questions about intelligence. It’s not always socially feasible, though. (Or at least, I can’t always fease it socially.)
Think about how ridiculous your comment must sound to them. Some of those people have been researching AI for decades, wrote hundreds of papers that have been cited many thousands of times.
That you just assume that they must be stupid because they disagree with you seems incredible arrogant. They have probably thought about everything you know long before you and dismissed it.
I have no reason to suspect that other people’s use of the absurdity heuristic should cause me to reevaluate every argument I’ve ever seen.
That a de novo AGI will be nothing like a human child in terms of how to make it safe is an antiprediction in that it would take a tremendous amount of evidence to suggest otherwise, and yet Wang just assumes this without having any evidence at all. I can only conclude that the surface analogy is the entire content of the claim.
If he were just stupid, I’d have no right to be indignant at his basic mistake. He is clearly an intelligent person.
You are not making any sense. Think about how ridiculous your comment must sound to me.
(I’m starting to hate that you’ve become a fixture here.)
I think this single statement summarizes the huge rift between the narrow specific LW/EY view of AGI and other more mainstream views.
For researchers who are trying to emulate or simulate brain algorithms directly, its self-evidently obvious that the resulting AGI will start like a human child. If they succeed first your ‘antiprediction’ is trivially false. And then we have researchers like Wang or Goertzel who are pursuing AGI approaches that are not brain-like at all and yet still believe the AGI will learn like a human child and specifically use that analogy.
You can label anything an “antiprediction” and thus convince yourself that you need arbitrary positive evidence to disprove your counterfactual, but in doing so you are really just rationalizing your priors/existing beliefs.
Hadn’t seen the antiprediction angle—obvious now you point it out.
I actually applauded this comment. Thank you.
As a general heuristic, if you agree that someone is both intelligent and highly educated, and has made a conclusion that you consider to be a basic mistake, there are a variety of reasonable responses, one of the most obvious of which is to question if the issue in question falls under the category of a basic mistake or even as a mistake at all. Maybe you should update your models?
This is off-topic, but this sentence means nothing to me as a person with a consequentialist morality.
The consequentialist argument is as follows:
Lowering the status of people who make basic mistakes causes them to be less likely to make those mistakes. However, you can’t demand that non-intelligent people don’t make basic mistakes, as they are going to make them anyway. So demand that smart people do better and maybe they will.
The reasoning is the same as Sark Julian’s here/here.
I guess the word “right” threw you off. I am a consequentialist.
I’d guess that such status lowering mainly benefits in communicating desired social norms to bystanders. I’m not sure we can expect those whose status is lowered to accept the social norm, or at least not right away.
In general, I’m very uncertain about the best way to persuade people that they could stand to shape up.
People like you are the biggest problem. I had 4 AI researchers emailing me, after I asked them about AI risks, that they regret having engaged with this community and that they will from now on ignore it because of the belittling attitude of its members.
So congratulations for increasing AI risks by giving the whole community a bad name, idiot.
This is an absolutely unacceptable response on at least three levels.
And you’re complaining about people’s belittling attitude, while you call them “idiots” and say stuff like “They have probably thought about everything you know long before you and dismissed it”?
Are you sure those AI researchers weren’t referring to you when they were talking about members’ belittling attitude?
The template XiXiDu used to contact the AI researchers seemed respectful and not at all belittling. I haven’t read the interviews themselves yet, but just looking at the one with Brandon Rohrer, his comment
doesn’t sound like he feels belittled by XiXiDu.
Also, I perceived your question as tension-starter, because whatever XiXiDu’s faults or virtues, he does seem to respect the opinion of non-LW-researchers more than the average LW-member. I’m not here that often, but I assume that if I noticed that, somebody with a substantially higher Karma would have also noticed that. That makes me think there is a chance that your question wasn’t meant as a serious inquiry, but as an attack—which XiXiDu answered to in kind.
Aris is accusing XiXiDu of being belittling not to the AI researchers, but to the people who disagree with the AI researchers.
Ah, I see. Thanks for the clarification.
Sure. The first comment was a mirror image of his style to show him how it is like when others act the way he does. The second comment was a direct counterstrike against him attacking me and along the lines of what your scriptures teach.
And what justification do you have for now insulting me by calling the Sequences (I presume) my “scriptures”?
Or are you going to claim that this bit was not meant for an insult and an accusation? I think it very clearly was.
People expressing disagreement with someone who is confident that they are right and is secure in their own status is going to be perceived by said high status person as foolish (or as an enemy to be crushed). This doesn’t mean you should never do so, merely that you will lose the goodwill of the person being criticized if you choose to do so.
Note that Grognor gave here a direction of his update based on this conversation. Even if he takes the status of Wang as overwhelmingly strong evidence of the correctness of his position it doesn’t mean that the direction of the update based on this particular piece of additional information should not be ‘down’. In fact, the more respect Grognor had for the speaker’s position prior to hearing him speak, the easier it is for the words to require a downward update. If there wasn’t already respect in place the new information wouldn’t be surprising.
He didn’t do that. Or, at least, Grognor’s comment doesn’t indicate that he did that. He saw a problem of basic logic in the arguments presented and took that as evidence against the conclusion. If Grognor could not do that it would be essentially pointless for him to evaluate the arguments at all.
If people are updating correctly they should have already factored in to their beliefs the fact that the SIAI position isn’t anywhere near the consensus in university artificial intelligence research. Given the assumption that these people are highly intelligent and well versed in their fields one should probably disbelieve the SIAI position at the outset because one expects that upon hearing from a “mainstream” AI researcher one would learn the reasons why SIAI is wrong. But if you then read a conversation between a “mainstream” AI researcher and an SIAI researcher and the former can’t explain why the latter is wrong then you better start updating.
I’m sure this is true when it comes to, say, programming a particular sort of artificial intelligence. But you are vastly overestimating how much thought scientists and engineers put into broad, philosophical concerns involving their fields. With few exceptions mathematicians don’t spend time thinking about the reality of infinite sets, physicists don’t spend time thinking about interpretations of quantum mechanics, computer programmers don’t spend time thinking about the Church-Turing-Deutsch principle etc.
His arguments were not worse than Luke’s arguments if you ignore all the links, which he has no reason to read. He said that he does not believe that it is possible to restrict an AI the way that SI does imagine and still produce a general intelligence. He believes that the most promising route is AI that can learn by being taught.
In combination with his doubts about uncontrollable superintelligence, that position is not incoherent. You can also not claim, given this short dialogue, that he did not explain why SI is wrong.
That’s not what I was referring to. I doubt they have thought a lot about AI risks. What I meant is that they have likely thought about the possibility of recursive self-improvement and uncontrollable superhuman intelligence.
If an AI researcher tells you that he believes that AI risks are not a serious issue because they do not believe that AI can get out of control for technical reasons and you reply that they have not thought about AI drives and the philosophical reasons for why superhuman AI will pose a risk, then you created a straw man. Which is the usual tactic employed here.
And yet they demonstrate that they have not, by not engaging the core arguments made by SI/FHI when they talk about it.
Suppose you build an AI that exactly replicates the developmental algorithms of an infant brain, and you embody it in a perfect virtual body. For this particular type of AI design, the analogy is perfectly exact, and the AI is in fact a child exactly equivalent to a human child.
A specific human brain is a single point in mindspace, but the set of similar architectures extends out into a wider region which probably overlaps highly with much of the useful, viable, accessible space of AGI designs. So the analogy has fairly wide reach.
As an analogy, its hard to see how comparing a young AI to a child is intrinsically worse than comparing the invention of AI to the invention of flight, for example.
Up until the point that Pinocchio realises he isn’t real boy.
This is great. Pei really impresses me, especially with the linked paper, “The Assumptions of Knowledge and Resources in Models of Rationality”. If you haven’t read it, please read it. It clarifies everything Pei is saying and allows you to understand his perspective much better.
That said, I think Luke’s final rebuttal was spot on, and I would like to find out whether Pei has changed his mind after reading “Superintelligent Will”.
It comes across as 20 pages of sour grapes about AIXI :-(
Luke remarked:
I expect that this is meant as a metaphorical remark about the low value of some philosophy, rather than literally as a call for banning anyone from doing philosophy. However, this sort of comment squicks me, especially in this context.
I guess I should have made the joke more explicit. Note that I do not have primary training in cognitive science, computer science, or mathematics, and yet here I am, doing philosophy. :)
I didn’t get the joke either.
Oh, I read it as an attempt to flatter Pei by taking a cheap shot at Philosophers.
I actually read it and thought, “yeah, that sounds about right.” Though re-reading it it’s obviously at least hyperbole, though I wouldn’t say it’s obviously a joke.
I think we should expect AGIs to have more stable goal systems that are less affected by their beliefs and environment than humans. Remember that humans are a symbol processing system on top of a behavior learning system on top of an association learning system. And we don’t let our beliefs propagate by default, and our brains experience physiological changes as a result of aging. It seems like there would be a lot more room for goal change in such a messy aging architecture.
The least intelligent humans tend not to be very cautious and tend to have poor impulse control. (Examples: children and petty criminals.) The areas of the brain associated with impulse control only develop later in life, just like intelligence only develops later in life, so we tend to assume intelligence and cautious behavior are correlated. But things don’t have to be that way for AGIs. Let’s be careful not to generalize from humans and assume that unintelligent AGIs will be incautious about modifying themselves.
If such a poorly designed system as a human has the ability to change its goals in response to stimuli, and we find this to be a desirable property, then surely a carefully designed AI will have the same property, unless we have an even better property to replace it with? The argument, “humans are bad, AIs are good, therefore AIs will do something bad” seems unlikely at face value.
(Note that I would like something that more reliably acquires desirable goals than humans, so still think FAI research is worthwhile, but I would prefer that only the strongest arguments be presented for it, especially given the base rate of objection to FAI-style arguments.)
Why is changing one’s goals in response to stimuli a valuable property? A priori, it doesn’t seem valuable or harmful.
This wasn’t meant to be an argument either way for FAI research, just a thought on something Pei said.
Sorry if this is a silly question, but what does AIKR refer to? Google has failed me in this regard.
Assumption of Insufficient Knowledge and Resources.
<bad dubbing>My Google-fu is more powerful than yours!
It’s from Pei’s linked paper we kept referring to.
See also Q&A with experts on risks from AI #3, including Pei Wang.
Pei’s point that FAI is hard/impossible seems to me to be an argument that we should be even more keen to stop AGI research. He does a good argument that FAI doesn’t work, but hasn’t got anything to substitute in its place (“educating the AI” might be the only way to go, but that doesn’t mean it’s a good way to go).
It seems to me that Pei’s point is that it is too early to know whether FAI is possible, and further research is needed to determine this. There may come a day when the only prudent course of action is to stop the AI research cold in the name of survival of the human race, but today is not that day.
Agreed. But he never engages with the idea of pointing the field in a different direction, of prioritising certain types of research. He concludes that the field is fine exactly as it is now, and that the researchers should all be left alone.
I think we can detect a strong hint of status quo bias here. Not enough to dismiss his points, but enough to question them. If he’d concluded “we need less safety research than now” or something, I’d have respected his opinion more.
Why would he? According to his views, the field is still in its infancy, stifling its natural development in any way would be a bad idea.
I don’t see Pei distinguishing between instrumental and ultimate goals anywhere. Whereas Luke does do this. Maybe a failure to make that distinction explains the resulting muddle.
Update − 2012-04-23 - it looks as though he does do something similar here:
...though original / derived doesn’t seem to be quite the same idea.
Eliezer’s epiphany about precision, which I completely subscribe to, negates most of Pei’s arguments for me.
I have read the post and I still don’t understand what you mean.
As I understand it, Eliezer’s concept of precision involves trying to find formalizations that are provably unique or optimal in some sense, not just formalizations that work. For example, Bayesianism is a unique solution under the conditions of Cox’s theorem. One of Pei’s papers points out flaws in Bayesianism and proposes an alternative approach, but without any proof of uniqueness. I see that as an avoidable mistake.
I guess Pei’s intuition is that a proof of uniqueness or optimality under unrealistic assumptions is of little practical value, and doing such proofs under realistic assumptions is unfeasible compared to the approach he is taking.
ETA: When you write most kinds of software, you don’t first prove that your design is optimal or unique, but just start with something that you intuitively think would work, and then refine it by trial and error. Why shouldn’t this work for AGI?
ETA 2: In case it wasn’t clear, I’m not advocating that we build AGIs by trial and error, but just trying to explain what Pei is probably thinking, and why cousin_it’s link isn’t likely to be convincing for him.
If one isn’t concerned about the AGI’s ability to either (a) be able out-of-the-box to either successfully subvert the testing mechanisms being applied to it or successfully neutralize whatever mechanisms are in place to deal with it if it “fails” those tests, or (b) self-improve rapidly enough to achieve that state before those testing or dealing-with-failures mechanisms apply, then sure, a sufficiently well-designed test harness around some plausible-but-not-guaranteed algorithms will work fine, as it does for most software.
Of course, if one is concerned about an AGI’s ability to do either of those things, one may not wish to rely on such a test harness.
It seems to follow from this that some kind of quantification of what kinds of algorithms can do either of those things, and whether there’s any way to reliably determine whether a particular algorithm falls into that set prior to implementing it, might allow AGI developers to do trial-and-error work on algorithms that provably don’t meet that standard would be one way of making measurable progress without arousing the fears of those who consider FOOMing algorithms a plausible existential risk.
Of course, that doesn’t have the “we get it right and then everything is suddenly better” aspect of successfully building a FOOMing FAI… it’s just research and development work, the same sort of incremental collective process that has resulted in, well, pretty much all human progress to date.
I guess he was talking about the kind of precision more specific to AI, which goes like “compared to the superintelligence space, the Friendly AI space is a tiny dot. We should aim precisely, at the first try, or else”.
And because the problem is impossible to solve, you have to think precisely in the first place, or you won’t be able to aim precisely at the first try. (Because whatever Skynet we build won’t give us the shadow of a second chance).
Compared to the space of possible 747 component configurations, the space of systems representing working 747s is tiny. We should aim precisely, at the first try, or else!
Well, yes. But to qualify as a super-intelligence, a system have to have optimization power way beyond a mere human. This is no small feat, but still, the fraction of AIs that do what we would want compared to the ones that would do something else (crushing us like a car does an insect in the process) is likely tiny.
A 747 analogy that would work for me would be that on the first try, you have to set the 747 full of people at high altitude. Here, the equivalent of “or else” would be “the 747 falls like an anvil and everybody dies”.
Sure, one can think of ways to test an AI before setting it lose, but beware that if it’s more intelligent than you, it will outsmart you the instant you give it the opportunity. No matter what, the first real test flight will be full of passengers.
Well, nobody is starting out with a superintelligence. We are starting out with sub-human intelligence. A superhuman intelligence is bound to evolve gradually.
It didn’t work that way with 747s. They did loads of testing before risking hundreds of lives.
747s aren’t smart enough to behave differently when they do or don’t have passengers. If the AI might be behaving differently when it’s boxed then unboxed, then any boxed test isn’t “real”; unboxed tests “have passengers”.
Sure, but that’s no reason not to test. It’s a reason to try and make the tests realistic.
The point is not that we shouldn’t test. The point is that tests alone don’t give us the assurances we need.
But that stance makes assumptions that he does not share, as he does not believe that AGI will become uncontrollable.
and so on.
I think it’d be great if SIAI would not lath on the most favourable and least informative interpretation of any disagreement, in precisely the way how e.g. any community around free energy devices does. It’d be also great if Luke allowed for the possibility that Wang (and most other people whom are more intelligent, better educated, and more experienced than Luke) are actually correct, and Luke is completely wrong (or not even wrong).
The SIAI hasn’t seemed to lath on any interpretations. The quote you make and all the interpretations you disagree with here have been done by commenters from the internet that aren’t SIAI affiliated. The main thing Luke does that is a disservice to Wang is to post this conversation publicly, thereby embarrassing the guy and lowering his reputation among anyone who finds the reasoning expressed to be poor. But the conversation was supposed to be public and done with Wang’s foreknowledge—presumably the arguments he used he actually wants to be associated with.
As Grognor said (in the quote you made) this particular conversation served to significantly lower the perceived likelyhood that those things are correct. And this could become a real problem if it happens too often. Being exposed to too many bad arguments for a position can serve to bias the reader in the opposite direction to what has been argued. We need to find representatives from “mainstream AI researchers” that don’t make the kind of simple errors of reasoning that we see here. Presumably they exist?
When there is any use of domain specific knowledge and expertise, without a zillion citations for elementary facts, you see “simple errors of reasoning” whereas everyone else sees “you are a clueless dilettante”. Wang is a far more intelligent person than Luke, sorry, the world is unjust and there is nothing Luke or Eliezer can do about their relatively low intelligence compared to people in the field. Lack of education on top of the lower intelligence doesn’t help at all.
edit: I stand by it. I don’t find either Eliezer or Luke to be particularly smart; smarter than average blogger, for sure, but not genuises. I by the way score very high on IQ tests. I can judge not just by accomplishments but simply because I can actually evaluate the difficulty of the work, and, well, they never did anything that’s too difficult for IQ of 120 , maybe 125 . If there is one thing that makes LessWrong a cult, it is the high-confidence belief that the gurus are smartest, or among the smartest people on Earth.
Not sure about the real or perceived intelligence level, but speaking the same language as your partner in a discussion certainly helps. Having reasonable credentials, while not essential, does not hurt, either.
Oh, and my experience is that arguing with wedrifid is futile, just to warn you.
I don’t have any cached assessment of personal experience with arguing with shminux. In my evaluation of how his arguments with other people play out I have concluded that people can often mitigate some of the epistemic damage shminux does while trying to advocate his agenda. It is barely worth even considering that the reason to refute the specific positions shminux takes is anything to do with attempting to change shminux’s mind or directly influence his behavior—that would indeed be futile.
Upvoted for illustrating my point nicely. Thank you!
How is this relevant? semianonymous doesn’t seem to be in any risk of sinking some utility into an argument with shminux.
Replying directly to that sort of sniping would be appropriate even if the relevance were limited purely to being a follow up to the personal remark. However, in this case the more significant relevance is:
shminux has been the dominant player on this thread (and a significant player in most recent threads that are relevant). He has a very clear position that he used this conversation to express.
Wedrifid refuted a couple of points that shminux tried to make and, if I recall, was one of the people who answered a rhetorical question of shminux’s with a literal answer—which represents strong opposition to a point shminux wants to be accepted to the degree of being considered obvious.
The meaning conveyed by the quoted sentence from shminux is not limited to being specifically about stopping semianonymous from arguing with wedrifid. (After all, I don’t want the semianonymous account to argue with me and on the purely denotative level I would agree with that recommendation.)
Shminux conveying the connotation “wedrifid is irrational and should be ignored” is a way to encourage others to discount any refutations or contradictory opinions that wedrifid may have made or will make to shminux. In fact it is one of the strongest moves shminux has available to him in the context, for the purpose of countering opposition to his beliefs.
Instead of people taking shminux seriously and discounting anything wedrifid has said in reply I (completely unsurprisingly) think people would be better off taking wedrifid seriously and taking a closer look at just how irrefutable shminux’s advocacy actually is. After all, there is more than one reason why shminux would personally find arguing with wedrifid futile. Not all of them require shminux being right.
I guess that makes sense. It seems a bit weird, but so it goes.
This is both petty and ridiculous—to the extent that Wang’s work output can be considered representative of intelligence. Please do not move the discussion to evaluations of pure intelligence. I have no desire to insult the guy but raw intelligence is not the area where you should set up a comparison here.
Are you even serious?
2 of these 3 seems to be clearly the case. I’m unsure how you are getting more intelligent. Your point may be valid completely without the intelligence bit in that intelligent people can easily be deeply mistaken about areas they don’t have much education, and one sees that not that infrequently. I’m am however curious how you are making the intelligence determination in question.
I’m unsure how you are not. Every single proxy for intelligence indicates a fairly dramatic gap in intelligence in favour of Wang. Of course for politeness sake we assume that they would be at least equally intelligent, and for phyg sake that Luke would be more intelligent, but it is simply very, very, very unlikely.
Can you state explicitly what proxies you are using here that you think indicate a dramatic gap?
Accomplishments of all kinds, the position, the likelihood that Wang has actually managed to move from effectively lower class (third world) to upper class (but I didn’t look up where he’s from, yet), etc.
What proxies do you think would indicate Luke is more intelligent? I can’t seem to think of any.
Wang is accomplished to the point where one can immediately see it simply from glancing at his CV. However, accomplishments are only a rough measure of intelligence. By some metrics, Conscientiousness is a better predictor of success than raw intelligence, and by many metrics it is at least as good a predictor. Relying on academic success as a metric of intelligence isn’t that reliable unless one is doing something like comparing the very top in a field. This also makes little sense given that Luke isn’t a member of academia.
The claim about the third world is puzzling- Wang is Chinese (a fact that I would think would be obvious from his name, and took me two seconds to verify by looking at his CV) and China has never been considered third world, but rather was (when the term made more sense) second world. Moreover, this isn’t just an argument over the meaning of words- China’s GDP per a capita_per_capita), average education level, average literacy level[1], or almost any other metric you choose is far higher than that of most countries classically considered to be in the third world.
Wang is also older than Luke. Wang finished his undergraduate degree in 1983, so he’s approximately in his early fifties now. Pei Wang has therefore had far more time to accomplish things. So simply lining up their accomplishment levels doesn’t work. (Although Wang clearly does have some accomplishments at a fairly young age, such as his thesis being selected for an Outstanding Dissertation Award by his university.)
I’m not sure why this question is being asked. I’m not aware of any either but it really doesn’t have much to do with the matter at hand. You’ve claimed not just that Wang is likely to be more intelligent but that “Every single proxy for intelligence indicates a fairly dramatic gap in intelligence”- that requires a lot more than simply not having any obvious pointers for Luke to be smarter. Overall, I’m deeply unconvinced that either one is more intelligent. This isn’t an issue of Luke being more intelligent. This is an issue of very little data in general.
[1] Some of the entries in that list are measured with different metrics, so this isn’t a perfect comparison.
And, it must be noted, more time to crystallize intuitions formed based off the common sense from yesteryear.
That isn’t relevant for the immediate issue of intelligence evaluation. It may be relevant to the general point at hand, but it sounds worryingly like a fully general counterargument.
It was a tangent of general interest to the progress of science. It could have been made purely as a relevant-to-intelligence-evaluation point if it were expanded by pointing to the well understood relationship of fluid and crystallized intelligence as they change over time.
It is merely something that tempers the degree to which the fully general argument “This person is more experienced and has collected more prestige therefore he is right” should be given weight. It would become a ‘fully general counterargument’ when people started using “nah, she’s old” in a general context. When used specifically when evaluating the strength of the evidence indicated by prestige it is simply one of the relevant factors under consideration.
There is a world of difference between a minor point of general relevance to the evaluation of a specific kind of evidence and a “fully general counter-argument”. The abuse of the former would be required for the latter charge to be justified—and that isn’t the case here.
Good point.
There is very little data on Luke and that is a proxy for Luke being less intelligent, dramatically so. It is instrumental to Luke’s goals to provide such data. On the second world or third world that is irrelevant semantics.
edit: and as rather strong evidence that Luke is approximately as intelligent as the least intelligent version of Luke that can look the same to us, it suffices to cite normal distribution of intelligence.
That reply essentially ignores almost every comment I made. I’m particularly curious whether you are in agreement that Pei Wang isn’t from a third world country? Does that cause any update at all for your estimates?
Also, if we go back in time 20 years, so that Pei Wang would be about the same age Luke is now, do you think you’d have an accomplishment list for Pei Wang that was substantially longer than Luke’s current one? If so, how does that impact your claim?
I apologise for my unawareness that you call China second world. It is still the case that it is very difficult to move from China to US.
If we move back 20 years, it is 1992, and Pei Wang has already been a lecturer in China then moved to Indiana University. Length of the accomplishment list is a poor proxy, difficulty is important. As I explained in the edit, you shouldn’t forget about Bell’s curve. No evidence for intelligence is good evidence of absence, on the IQ>100 side of normal distribution.
Ah, that’s what you meant by the other remark. In that case, this isn’t backing up claimed prior proxies and is a new argument. Let’s be clear on that. So how valid is this? I don’t think this is a good argument at all. Anyone who has read what Luke has to say or interacted with Luke can tell pretty strongly that Luke is on the right side of the Bell curve. Sure, if I pick a random person the chance that they are as smart as Pei Wang is tiny, but that’s not the case here.
There are a lot of Chinese academics who come to the United States. So what do you mean by very difficult?
He doesn’t have his PhD at that point. He gets that at Indiana. I can’t tell precisely from his CV what he means by lecturer, but at least in the US it often means a position primarily given for teaching rather than research. Given that he didn’t have a doctorate at the time, it is very likely that it means something similar, what we might call an adjunct here. That it isn’t a very good demonstration of intelligence at all. Luke has in his time run a popular blog that has been praised for its clarity and good writing. And you still haven’t addressed the issue that Luke was never trying to go into academia.
New to you. Not new to me. Should not have been new to you either. Study and train to reduce communication overhead.
Exercise for you: find formula for distribution of IQ of someone whom you know to have IQ>x . (I mean, find variance and other properties).
Those born higher up social ladder don’t understand it is hard to climb below them too.
Sorry if my point was unclear. The point is that this is a new argument in this discussion. That means it isn’t one of the proxies listed earlier, so bringing it up isn’t relevant to the discussion of those proxies. To use an analogy, someone could assert that the moon is made of rock and that their primary reason for thinking so is that Cthulhu said so. If when pressed on this, they point out that this is backed up by other evidence, this doesn’t make revelation from Cthulhu turn into a better argument than it already was.
This isn’t a claim that his IQ as estimated is greater than x+ epsilon, since we can’t measure any epsilon > 0. If you prefer, the point is that his writings and work demonstrate an IQ that is on the right end of the Bell curve by a non-trivial amount.
That doesn’t answer the question in any useful way especially because we don’t know where Pei Wang’s original social status was. The question is whether his coming to the US for graduate school is strongly indicative of intelligence to the point where you can use it as a proxy that asserts that Wang is “dramatically” more intelligent than Luke. Without more information or specification, this is a weak argument.
My point is that this bell curve shouldn’t be a new argument, it should be the first step in your reasoning and if it was not, you must have been going in the other direction. You seem to be now doing the same with the original social status.
I think I have sufficiently answered your question: I find Wang’s writings and accomplishments to require significantly higher intelligence (at minimum) than Luke’s, and I started with normal distribution as the prior (as everyone should). In any game of wits with no massive disparity in training in favour of Luke, I would bet on Wang.
The strength of your position is not commensurate with your level of condescension here. In fact, you seem to be just trying to find excuses to back up your earlier unjustified insults—that isn’t something that JoshuaZ training and studying would help you with.
I fail to see how the suggestion that Wang is much smarter than Luke is an insult—unless Luke believes that there can’t be a person much smarter than him.
If you stand by this statement as written, I’m at a loss for what your starting assumptions about social interactions even look like.
Conversely, if you only meant it as rhetorical hyperbole, would you mind glossing it with your actual meaning?
I try not to assume narcissist personality disorder. Most people have IQ around 100 and are perfectly comfortable with the notion that accomplished PhD is smarter than they are. Most smart people, also, are perfectly comfortable with the notion that someone significantly more accomplished is probably smarter than they are. Some people have NPD and have operating assumption ‘I am the smartest person in the world’ but they are a minority across entire spectrum of intelligence. There are also cultural differences.
How have you measured their level of comfort with the idea? Do you often tell such people that when they disagree with such an accomplished PhD, that the accomplished PhD is smarter than them? And do they tend to be appreciative of you saying that?
Outside of politically motivated issues (e.g. global warming), most people tend to generally not disagree with accomplished scientists on the topics within that scientist’s area of expertise and accomplishment, and to treat the more accomplished person as source of wisdom rather than as opponent in a debate. It is furthermore my honest opinion that Wang is more intelligent than Luke, and it is also the opinion that most reasonable people would share, and Luke must understand this.
I have to imagine that either you derive a heroic amount of pleasure from feeding trolls, or you place a remarkably low value on signal/noise ratio.
The signal being what exactly?
Replying in a second comment to the parts you edited in (to keep conversation flow clear and also so you see this remark):
I outlined precisely why this wasn’t just a semantic issue. China and most Chinese citizens are pretty well off and they have access to a decent education system. This isn’t a semantic issue. A moderately smart random Chinese citizen has pretty decent chances at success.
I don’t understand this comment. I can’t parse it in any way that makes sense. Can you expand/clarify on this remark? Also, is this to be understood as a new argument for a dramatic intelligence gap and not an attempt to address the previously listed proxies?
If accomplishments is the only proxy you use to evaluate their relative intelligence, then it would have been all-around better if you had said “more accomplished” rather than “more intelligent”, as it’s more precise, less controversial, and doesn’t confuse fact with inference.
It also does not present valid inference. Ideally, you’re right but in practice people do not make the inferences they do not like.
If you wanted to present the inference, then present it as an inference.
e.g. “more accomplished (and thus I conclude more intelligent)” would have been vastly better than what you did, which was to just present your conclusion in a manner that would inevitably bait others to dispute it/take offense against it.
It is clear that you just don’t want to hear opinion more intelligent without qualifiers that allow you to disregard this opinion immediately, and you are being obtuse.
Trust me, it’s quite easier to disregard an accusation/insult when you do not include an explicit chain of reasoning. It’s harder to not respond to, because it mentally tags you as just ‘enemy’, but for the same reason it’s easier to disregard.
As for “being obtuse”, don’t confuse civility with obtuseness. I knew you for what you are. I knew that the trolling and the flamebaiting is what you attempted to do, So I knew that any attempts to direct you towards a more productive means of discussion wouldn’t be heeded by you, as they were counterproductive to your true goals.
But nonetheless, my suggestion has the benefit of explicitly pinpointing the failure in your postings, to be hopefully heeded by any others that are more honest at seeking to make an actual argument, not just troll people.
It is not accusation or insult. It is the case though that the people in question (Luke, Eliezer) need to assume the possibility that people they are talking to are more intelligent than they are—something that is clearly more probable than not given available evidence—and they seem not to.
I don’t see how that would be relevant to the issue at hand, and thus, why they “need to assume [this] possibility”. Whether they assume the people they talk to can be more intelligent than them or not, so long as they engage them on an even intellectual ground (e.g. trading civil letters of argumentation), is simply irrelevant.
Better than Goetzel but why didn’t you put self-improving AI on the table with the rest of the propositions? That’s a fundamental piece of the puzzle in understanding Pei’s position. It could be that he thinks a well taught AI is safe enough because it won’t self-modify.
If an AGI research group were close to success but did not respect friendly AI principles, should the government shut them down?
Let’s try an easier question first. If someone is about to create Skynet, should you stop them?
The principles espoused by the majority on this site can be used to justify some very, very bad actions.
1) The probability of someone inventing AI is high
2) The probability of someone inventing unfriendly AI if they are not associated with SIAI is high
3) The utility of inventing unfriendly AI is negative MAXINT
4) “Shut up and calculate”—trust the math and not your gut if your utility calculations tell you to do something that feels awful.
It’s not hard to figure out that Less Wrong’s moral code supports some very, unsavory, actions.
Your original question wasn’t about LW. Before we turn this into a debate about finetuning LW’s moral code, shall we consider the big picture? It’s 90 years since the word “robot” was introduced, in a play which already featured the possibility of a machine uprising. It’s over 50 years since “artificial intelligence” was introduced as a new academic discipline. We already live in a world where one state can use a computer virus to disrupt the strategic technical infrastructure of another state. The quest for AI, and the idea of popular resistance to AI, have already been out there in the culture for years.
Furthermore, the LW ethos is pro-AI as well as anti-AI. But if they feel threatened by AIs, the average person will just be anti-AI. The average person isn’t pining for the singularity, but they do want to live. Imagine something like the movement to stop climate change, but it’s a movement to stop the singularity. Such a movement would undoubtedly appropriate any useful sentiments it found here, but its ethos and organization would be quite different. You should be addressing yourself to this future anti-AI, pro-human movement, and explaining to them why anyone who works on any form of AI should be given any freedom to do so at all.
-Terminator 3: Rise of the Machines
I think the most glaring problem I could detect with Pei’s position is captured in this quotation:
This totally dodge’s Luke’s point that we don’t have a clue what such moral education would be like because we don’t understand these things about people. For this specific point of Pei’s to be taken seriously, we’d have to believe that (1) AGI’s will be built, (2) They will be built in such a way as to accept their sensory inputs in modes that are exceedingly similar to human sensory perception (which we do not understand very well at all), (3) The time scale of the very-human-perceptive AGI’s cognition will also be extremely similar to human cognition.
Then we could “teach” the AGI how to be moral much like we teach a child our favorite cultural and ethical norms. But I don’t see any reason why (1), (2), or (3) should be likely, let alone why their intersection should be likely.
I think Pei is suffering an unfortunate mind projection fallacy here. He seems to have in his mind humanoid robot with AGI software for brains, that has similar sensory modalities, updates its brain state at a similar rate as a human, steers its attention mechanisms in similar ways. This is outrageously unlikely for something that didn’t evolve on the savanna, that isn’t worried about whether predators are coming around the corner, and isn’t really hungry and looking for a store of salt/fat/sugar. This would only be likely if something like connectomics and emulations becomes evident as the most likely path to digital AGI. But it seems like Pei’s default assumption.
To boot, suppose we did build AGI that “think slowly” like people do, and could be “taught” morals in a manner similar to people. Why wouldn’t defense organizations and governments grab those first copies and modify them to think faster? There would probably be a tech war, or a race at least, and as the mode of cognition was optimized out of the human-similar regime, control over moral developments would be lost very quickly.
Lastly, if an AGI thinks much faster than a human (I believe this is likely to happen very soon after AGI is created), then even if it’s long-term goal structure is free to change in response to the environment, it won’t matter on time scales that humans care about. If it just wants to play Super Mario Brothers, and it thinks 10,000+ times faster than we do, we won’t have an opportunity to convince it to listen to our moral teachings. By the time we say “Treat others as...” we’d be dead. Specifically, Pei says,
But in terms of time scales that matter for human survival, initial goals will certainly dominate. It’s pure mind projection fallacy to imagine an AGI that has slow-moving adjustments to its goals, in such a way that we can easily control and guide them. Belief that the initial state doesn’t much matter is a really dangerous assumption.
That someone as intelligent and well-educated on this topic as Pei can believe that assumption without even acknowledging that it’s an assumption, let alone a most likely unjustified one, is very terrifying to me.
Says who? We can’t mass produce saints, but we know that people from stable well-resourced homes tend not to be criminal.
There’s a lot of things we don’t understand. We don’t know how to build AI’s with human-style intelligene at switch-on, so Pei’s assumption that training will be required is probably on the money.
We can;t make it as a fast as we like, but we can make it as slow as we like, If we need to train an AGI, and if it’s clock speed is hindering the process, then it is trivial to reduce it
Given the training assumption, it is likely: we will only be able to train an AI into humanlike intelligence if it is humanlike ITFP. Unhumanlike AIs will be abortive projects.
I think his assumptions make more sense that the LessWrongian assumption of AIs that are intelligent at boot-up.
The ordinary, average person isn’t a psychopath, so presumable the ordinary average education is good enough to avoid psychopathy even if it does’;t create saints.
Can the downvoter please comment? If I am making errors, I would welcome some guidance in updating so I can have a better understanding of Pei’s position.
I realize that I am espousing my own views here. I merely meant to suggest that Pei prematurely discredits certain alternatives; not to say that my imagined outcomes should be given high credence. This notion that we should teach AGI as we teach children seems especially problematic if you intend to build AGI before doing the hard work of studying the cognitive science underlying why teaching children succeeds in shaping goal structures. As a practitioner in computer vision, I can also speak to why I hold the belief that we won’t build AGI such that it has comparable sensory modalities to human beings, unless we go the connectomics route.
Pei remarked:
Sounds like Eliezer’s advice to be specific, doesn’t it? Or even the virtue of narrowness.
Yes it sounds like that, the same way as “Scientific theories should be narrow, mr. Newton. you should focus on just falling objects instead of trying to broaden your theory to the entire cosmos.”
Not really, no.