Muehlhauser-Goertzel Dialogue, Part 2

Part of the Muehlhauser interview series on AGI.

Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.

Ben Goertzel is the Chairman at the AGI company Novamente and founder of the AGI conference series.

Continued from part 1...

Luke:

[Apr 11th, 2012]

I agree the future is unlikely to consist of a population of fairly distinct AGIs competing for resources, but I never thought that the arguments for Basic AI drives or “convergent instrumenta l goals” required that scenario to hold.

Anyway, I prefer the argument for convergent instrumental goals in Nick Bostrom ’s more recent paper ” The Superintelligent Will.” Which parts of Nick’s argument fail to persuade you?

Ben:

[Apr 12th, 2012]

Well, for one thing, I think his

Orthogonality Thesis

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

is misguided. It may be true, but who cares about possibility “in principle”? The question is whether any level of intelligence is PLAUSIBLY LIKELY to be combined with more or less any final goal in practice. And I really doubt it. I guess I could posit the alternative

Interdependency Thesis

Intelligence and final goals are in practice highly and subtly interdependent. In other words, in the actual world, various levels of intelligence are going to be highly correlated with various probability distributions over the space of final goals.

This just gets back to the issue we discussed already, of me thinking it’s really unlikely that a superintelligence would ever really have a really stupid goal like say, tiling the Cosmos with Mickey Mice.

Bostrom says

It might be possible through deliberate effort to construct a superintelligence that values … human welfare, moral goodness, or any other complex purpose that its designers might want it to serve. But it is no less possible—and probably technically easier—to build a superintelligence that places final value on nothing but calculating the decimals of pi.

but he gives no evidence for this assertion. Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence—so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.

One basic error Bostrom seems to be making in this paper, is to think about intelligence as something occurring in a sort of mathematical vacuum, divorced from the frustratingly messy and hard-to-quantify probability distributions characterizing actual reality....

Regarding his

The Instrumental Convergence Thesis

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

the first clause makes sense to me,

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations

but it doesn’t seem to me to justify the second clause

implying that these instrumental values are likely to be pursued by many intelligent agents.

The step from the first to the second clause seems to me to assume that the intelligent agents in question are being created and selected by some sort of process similar to evolution by natural selection, rather than being engineered carefully, or created via some other process beyond current human ken.

In short, I think the Bostrom paper is an admirably crisp statement of its perspective, and I agree that its conclusions seem to follow from its clearly stated assumptions—but the assumptions are not justified in the paper, and I don’t buy them at all.

Luke:

[Apr. 19, 2012]

Ben,

Let me explain why I think that:

(1) The fact that we can identify convergent instrumental goals (of the sort described by Bostrom) implies that many agents will pursue those instrumental goals.

Intelligent systems are intelligent because rather than simply executing hard-wired situation-action rules, they figure out how to construct plans that will lead to the probabilistic fulfillment of their final goals. That is why intelligent systems will pursue the convergent instrumental goals described by Bostrom. We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.

Next: I remain confused about why an intelligent system will decide that a particular final goal it has been given is “stupid,” and then change its final goals — especially given the convergent instrumental goal to preserve its final goals.

Perhaps the word “intelligence” is getting in our way. Let’s define a notion of ” optimization power,” which measures (roughly) an agent’s ability to optimize the world according to its preference ordering, across a very broad range of possible preference orderings and environments. I think we agree that AGIs with vastly greater-than-human optimization power will arrive in the next century or two. The problem, then, is that this superhuman AGI will almost certainly be optimizing the world for something other than what humans want, because what humans want is complex and fragile, and indeed we remain confused about what exactly it is that we want. A machine superoptimizer with a final goal of solving the Riemann hypothesis will simply be very good at solving the Riemann hypothesis (by whatever means necessary).

Which parts of this analysis do you think are wrong?

Ben:

[Apr. 20, 2012]

It seems to me that in your reply you are implicitly assuming a much stronger definition of “convergent” than the one Bostrom actually gives in his paper. He says

instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

Note the somewhat weaselly reference to a “wide range” of goals and situations—not, say, “nearly all feasible” goals and situations. Just because some values are convergent in the weak sense of his definition, doesn’t imply that AGIs we create will be likely to adopt these instrumental values. I think that his weak definition of “convergent” doesn’t actually imply convergence in any useful sense. On the other hand, if he’d made a stronger statement like

instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for nearly all feasible final goals and nearly all feasible situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

then I would disagree with the first clause of his statement (“instrumental values can be identified which...”), but I would be more willing to accept that the second clause (after the “implying”) followed from the first.

About optimization—I think it’s rather naive and narrow-minded to view hypothetical superhuman superminds as “optimization powers.” It’s a bit like a dog viewing a human as an “eating and mating power.” Sure, there’s some accuracy to that perspective—we do eat and mate, and some of our behaviors may be understood based on this. On the other hand, a lot of our behaviors are not very well understood in terms of these, or any dog-level concepts. Similarly, I would bet that the bulk of a superhuman supermind’s behaviors and internal structures and dynamics will not be explicable in terms of the concepts that are important to humans, such as “optimization.”

So when you say “this superhuman AGI will almost certainly be optimizing the world for something other than what humans want,” I don’t feel confident that what a superhuman AGI will be doing, will be usefully describable as optimizing anything ….

Luke:

[May 1, 2012]

I think our dialogue has reached the point of diminishing marginal returns, so I’ll conclude with just a few points and let you have the last word.

On convergent instrumental goals, I encourage readers to read ” The Superintelligent Will” and make up their own minds.

On the convergence of advanced intelligent systems toward optimization behavior, I’ll point you to Omohundro (2007).

Ben:

Well, it’s been a fun chat. Although it hasn’t really covered much new ground, there have been some new phrasings and minor new twists.

One thing I’m repeatedly struck by in discussions on these matters with you and other SIAI folks, is the way the strings of reason are pulled by the puppet-master of intuition. With so many of these topics on which we disagree—for example: the Scary Idea, the importance of optimization for intelligence, the existence of strongly convergent goals for intelligences—you and the other core SIAI folks share a certain set of intuitions, which seem quite strongly held. Then you formulate rational arguments in favor of these intuitions—but the conclusions that result from these rational arguments are very weak. For instance, the Scary Idea intuition corresponds to a rational argument that “superhuman AGI might plausibly kill everyone.” The intuition about strongly convergent goals for intelligences, corresponds to a rational argument about goals that are convergent for a “wide range” of intelligences. Etc.

On my side, I have a strong intuition that OpenCog can be made into a human-level general intelligence, and that if this intelligence is raised properly it will turn out benevolent and help us launch a positive Singularity. However, I can’t fully rationally substantiate this intuition either—all I can really fully rationally argue for is something weaker like “It seems plausible that a fully implemented OpenCog system might display human-level or greater intelligence on feasible computational resources, and might turn out benevolent if raised properly.” In my case just like yours, reason is far weaker than intuition.

Another thing that strikes me, reflecting on our conversation, is the difference between the degrees of confidence required, in modern democratic society, to TRY something versus to STOP others from trying something. A rough intuition is often enough to initiate a project, even a large one. On the other hand, to get someone else’s work banned based on a rough intuition is pretty hard. To ban someone else’s work, you either need a really thoroughly ironclad logical argument, or you need to stir up a lot of hysteria.

What this suggests to me is that, while my intuitions regarding OpenCog seem to be sufficient to motivate others to help me to build OpenCog (via making them interested enough in it that they develop their own intuitions about it), your intuitions regarding the dangers of AGI are not going to be sufficient to get work on AGI systems like OpenCog stopped. To halt AGI development, if you wanted to (and you haven’t said that you do, I realize), you’d either need to fan hysteria very successfully, or come up with much stronger logical arguments, ones that match the force of your intuition on the subject.

Anyway, even though I have very different intuitions than you and your SIAI colleagues about a lot of things, I do think you guys are performing some valuable services—not just through the excellent Singularity Summit conferences, but also by raising some difficult and important issues in the public eye. Humanity spends a lot of its attention on some really unimportant things, so it’s good to have folks like SIAI nudging the world to think about critical issues regarding our future. In the end, whether SIAI’s views are actually correct may be peripheral to the organization’s main value and impact.

I look forward to future conversations, and especially look forward to resuming this conversation one day with a human-level AGI as the mediator ;-)