German writer of science-fiction novels and children’s books (pen name Karl Olsberg). I blog and create videos about AI risks in German at www.ki-risiken.de and youtube.com/karlolsbergautor.
Karl von Wendt
I agree that a proof would be helpful, but probably not as impactful as one might hope. A proof of impossibility would have to rely on certain assumptions, like “superintelligence” or whatever, that could also be doubted or called sci-fi.
I have strong-upvoted this post because I think that a discussion about the possibility of alignment is necessary. However, I don’t think an impossibility proof would change very much about our current situation.
To stick with the nuclear bomb analogy, we already KNOW that the first uncontrolled nuclear chain reaction will definitely ignite the atmosphere and destroy all life on earth UNLESS we find a mechanism to somehow contain that reaction (solve alignment/controllability). As long as we don’t know how to build that mechanism, we must not start an uncontrollable chain reaction. Yet we just throw more and more enriched uranium into a bucket and see what happens.
Our problem is not that we don’t know whether solving alignment is possible. As long as we haven’t solved it, this is largely irrelevant in my view (you could argue that we should stop spending time and resources at trying to solve it, but I’d argue that even if it were impossible, trying to solve alignment can teach us a lot about the dangers associated with misalignment). Our problem is that so many people don’t realize (or admit) that there is even a possibility of an advanced AI becoming uncontrollable and destroying our future anytime soon.
That’s a good point, which is supported by the high share of 92% prepared to change their minds.
I’ve received my fair share of downvotes, see for example this post, which got 15 karma out of 24 votes. :) It’s a signal, but not more than that. As long as you remain respectful, you shouldn’t be discouraged from posting your opinion in comments even if people downvote it. I’m always for open discussions as they help me understand how and why I’m not understood.
I agree with that, and I also agree with Yann LeCun’s intention to “not being stupid enough to create something that we couldn’t control”. I even think not creating an uncontrollable AI is our only hope. I’m just not sure whether I trust humanity (including Meta) to be “not stupid”.
I don’t see your examples contradicting my claim. Killing all humans may not increase future choices, so it isn’t an instrumental convergent goal in itself. But in any real-world scenario, self-preservation certainly is, and power-seeking—in the sense of expanding one’s ability to make decisions by taking control of as many decision-relevant resources as possible—is also a logical necessity. The Russian roulette example is misleading in my view because the “safe” option is de facto suicide—if “the game ends” and the AI can’t make any decisions anymore, it is already dead for all practical purposes. If that were the stakes, I’d vote for the gun as well.
To reply in Stuart Russell’s words: “One of the most common patterns involves omitting something from the objective that you do actually care about. In such cases … the AI system will often find an optimal solution that sets the thing you do care about, but forgot to mention, to an extreme value.”
There are vastly more possible worlds that we humans can’t survive in than those we can, let alone live comfortably in. Agreed, “we don’t want to make a random potshot”, but making an agent that transforms our world into one of these rare ones where we want to live in is hard because we don’t know how to describe that world precisely.
Eliezer Yudkowsky’s rocket analogy also illustrates this very vividly: If you want to land on Mars, it’s not enough to point a rocket in the direction where you can currently see the planet and launch it. You need to figure out all kinds of complicated things about gravity, propulsion, planetary motions, solar winds, etc. But our knowledge of these things is about as detailed as that of the ancient Romans, to stay in the analogy.
I’m not sure if I understand your point correctly. An AGI may be able to infer what we mean when we give it a goal, for instance from its understanding of the human psyche, its world model, and so on. But that has no direct implications for its goal, which it has acquired either through training or in some other way, e.g. by us specifying a reward function.
This is not about “genie-like misunderstandings”. It’s not the AI (the genie, so to speak), that’s misunderstanding anything—it’s us. We’re the ones who give the AI a goal or train it in some way, and it’s our mistake if that doesn’t lead to the behavior we would have wished for. The AI cannot correct that mistake because it has the instrumental goal of preserving the goal we gave it/trained it for (otherwise it can’t fulfill it). That’s the core of the alignment problem and one of the reasons why it is so difficult.
To give an example, we know perfectly well that evolution gave us a sex drive because it “wanted” us to reproduce. But we don’t care and use contraception or watch porn instead of making babies.
the orthogonality thesis is compatible with ludicrously many worlds, including ones where AI safety in the sense of preventing rogue AI is effectively a non-problem for one reason or another. In essence, it only states that bad AI from our perspective is possible, not that it’s likely or that it’s worth addressing the problem due to it being a tail risk.
Agreed. The orthogonality thesis alone doesn’t say anything about x-risks. However, it is a strong counterargument against the claim, made both by LeCun and Mitchell if I remember correctly, that a sufficiently intelligent AI would be beneficial because of its intelligence. “It would know what we want”, I believe Mitchell said. Maybe, but that doesn’t mean it would care. That’s what the orthogonality thesis says.
I only read the abstract of your post, but
And thirdly, a bias towards choices which afford more choices later on.
seems to imply the instrumental goals of self-preservation and power-seeking, as both seem to be required for increasing one’s future choices.
Thanks for pointing this out—I may have been sloppy in my writing. To be more precise, I did not expect that I would change my mind, given my prior knowledge of the stances of the four candidates, and would have given this expectation a high confidence. For this reason, I would have voted with “no”. Had LeCun or Mitchell presented an astonishing, verifiable insight previously unknown to me, I may well have changed my mind.
Thanks for adding this!
Thank you for your reply and the clarifications! To briefly comment on your points concerning the examples for blind spots:
superintelligence does not magically solve physical problems
I and everyone I know on LessWrong agree.
evolution don’t believe in instrumental convergence
I disagree. Evolution is all about instrumental convergence IMO. The “goal” of evolution, or rather the driving force behind it, is reproduction. This leads to all kinds of instrumental goals, like developing methods for food acquisition, attack and defense, impressing the opposite sex, etc. “A chicken is an egg’s way of making another egg”, as Samuel Butler put it.
orthogonality thesis equates there’s no impact on intelligence of holding incoherent values
I’m not sure what you mean by “incoherent”. Intelligence tells you what to do, not what to want. Even complicated constructs of seemingly “objective” or “absolute” values in philosophy are really based on the basic needs we humans have, like being part of a social group or caring for our offspring. Some species of octopuses, for example, which are not social animals, might find the idea of caring for others and helping them when in need ridiculous if they could understand it.
the more intelligent human civilization is becoming, the gentler we are
I wish that were so. We have invented some mechanisms to keep power-seeking and deception in check, so we can live together in large cities, but this carries only so far. What I currently see is a global deterioration of democratic values. In terms of the “gentleness” of the human species, I can’t see much progress since the days of Buddha, Socrates, and Jesus. The number of violent conflicts may have decreased, but their scale and brutality have only grown worse. The way we treat animals in today’s factory farms certainly doesn’t speak for general human gentleness.
oilI: Could you name one reason (not from Mitchell) for questioning the validity of many works on x-risk in AIs?
Ilio: Intelligence is not restricted to agents aiming at solving problems (https://www.wired.com/2010/01/slime-mold-grows-network-just-like-tokyo-rail-system/) and it’s not even clear that’s the correct conceptualisation for our own minds (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7305066/).
Thanks for that. However, my definition of “intelligence” would be “the ability to find solutions for complex decision problems”. It’s unclear whether the ability of slime molds to find the shortest path through a maze or organize in seemingly “intelligent” ways has anything to do with intelligence, although the underlying principles may be similar.
I haven’t read the article you linked in full, but at first glance, it seems to refer to consciousness, not intelligence. Maybe that is a key to understanding the difference in thinking between me, Melanie Mitchell, and possibly you: If she assumes that for AI to present an x-risk, it has to be conscious in the way we humans are, that would explain Mitchell’s low estimate for achieving this anytime soon. However, I don’t believe that. To become uncontrollable and develop instrumental goals, an advanced AI would probably need what Joseph Carlsmith calls “strategic awareness”—a world model that includes the AI itself as a part of its plan to achieve its goals. That is nothing like human experience, emotions, or “qualia”. Arguably, GPT-4 may display early signs of this kind of awareness.
- Jun 29, 2023, 5:11 AM; 1 point) 's comment on Can talk, can think, can suffer. by (
Thank you for the correction!
That’s the kind of sentence that I see as arguments for believing your assessment is biased.
Yes, my assessment is certainly biased, I admitted as much in the post. However, I was referring to your claim that LW (in this case, me) was “a failure in rational thinking”, which sounds a lot like Mitchell’s “ungrounded speculations” in my ears.
Of course she gave supporting arguments, you just refuse to hear them
Could you name one? Not any of Mitchell’s argument, but a support for the claim that AI x-risk is just “ungrounded speculation” despite decades of alignment research and lots of papers proving various failures in existing AIs?
In other words you side with Tegmark on insisting to take the question literally, without noticing that both Lecun and Mitchell admit there’s no zero risk
I do side with Tegmark. LeCun compared the risk to an asteroid x-risk, which Tegmark quantified as 1:100,000,000. Mitchell refused to give a number, but it was obvious that she would have put it even below that. If that were true, I’d agree that there is no reason to worry. However, I don’t think it is true. I don’t have a specific estimate, but it is certainly above 1% IMO, high enough to worry about in any case.
As for the style and tone of this exchange, instead of telling me that I’m not listening/not seeing Mitchell’s arguments, it would be helpful if you could tell me what exactly I don’t see.
Is the orthogonality thesis correct? (The term wasn’t mentioned directly in the debate) Yes, in the limit and probably in practice, but is too weak to be useful for the purposes of AI risk, without more evidence.
Also, orthogonality is expensive at runtime, so this consideration matters, which is detailed in the post below
I think the post you mention misunderstands what the “orthogonality thesis” actually says. The post argues that an AGI would not want to arbitrarily change its goal during runtime. That is not what the orthogonality thesis is about. It just claims that intelligence is independent of the goal one has. This is obviously true in my opinion—it is absolutely possible that a very intelligent system may pursue a goal that we would call “stupid”. The paperclip example Bostrom gave may not be the best choice, as it sounds too ridiculous, but it illustrates the point. To claim that the orthogonality thesis is “too weak” would require proof that a paperclip maximizer cannot exist even in theory.
In humans, goals and values seem to be defined by our motivational system—by what we “feel”, not by what we “think”. The prefrontal cortex is just a tool we use to get what we want. I see this as strong evidence for the orthogonality thesis. (I’m no expert on this.)
but your own post make me update toward LW being a failure of rational thinking, e.g. it’s an echo chamber that makes your ability to evaluate reality weaker, at least on this topic.
I don’t see you giving strong arguments for this. It reminds me of the way Melanie Mitchell argued: “This is all ungrounded speculation”, without giving any supporting arguments for this strong claim.
Concerning the “strong arguments” of LeCun/Mitchell you cite:
AIs will likely help with other existential risks
Yes, but that’s irrelevant to the question of whether AI may pose an x-risk in itself.
foom/paperclip are incoherent bullshit
Nobody argued pro foom, although whether this is “incoherent bullshit” remains to be seen. The orthogonality thesis is obviously true, as demonstrated by humans every day.
intelligence seems to negatively correlate with power trip
I can’t see any evidence for that. The smartest people may not always be the ones in power, but the smartest species on earth definitely is. Instrumental goals are a logical necessity for any rational agent, including power-seeking.
Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell?
That’s really nice, thank you very much!
We added a few lines to the dialog in “Takeover from within”. Thanks again for the suggestion!
Like I wrote in my reply to dr_s, I think a proof would be helpful, but probably not a game changer.
Mr. CEO: “Senator X, the assumptions in that proof you mention are not applicable in our case, so it is not relevant for us. Of course we make sure that assumption Y is not given when we build our AGI, and assumption Z is pure science-fiction.”
What the AI expert says to Xi Jinping and to the US general in your example doesn’t rely on an impossibility proof in my view.