Rob Bensinger comments on Ngo and Yudkowsky on alignment difficulty

Rob Bensinger 21 Jan 2022 0:38 UTC
2 points
EY could have said yes or no, but instead we get
I read Eliezer’s response as basically “Yes, in the following sense: I would certainly have learned very new and very exciting facts about intelligence...”
I prefer Eliezer’s response over just saying “yes”, because there’s ambiguity in what it means to be a “crux” here, and because “agentic” in Richard’s question is an unclear term.
I wish EY could stop saying “pivotal act” long enough to talk about why he thinks intelligence implies an urge for IRL agenticness.
I don’t know what you mean by “intelligence” or “an urge for IRL agenticness” here, but I think the basic argument for ‘sufficiently smart and general AI will behave as though it is consistently pursuing goals in the physical world’ is that sufficiently smart and general AI will (i) model the physical world, (ii) model chains of possible outcomes in the physical world, and (iii) be able to search for policies that make complex outcomes much more or less likely. If that’s not sufficient for “IRL agenticness”, then I’m not sure what would be sufficient or why it matters (for thinking about the core things that make AGI dangerous, or make it useful).
Talking about pivotal acts then clarifies what threshold of “sufficiently smart” actually matters for practical purposes. If there’s some threshold where AI becomes smart and general enough to be “in-real-life-agentic”, but this threshold is high above the level needed for pivotal acts, then we mostly don’t have to worry about “in-real-life agenticness”.
Or at least, define the term “pivotal act” and explain why he says it so much.
Here’s an explanation: https://arbital.com/p/pivotal/
Once again Yudkowsky could have agreed or disagreed or corrected, but confusingly chooses “none of the above”:
What do you find confusing about it? Eliezer is saying that he’s not making a claim about what’s possible in principle, just about what’s likely to be reached by the first AGI developers. He then answers the question here (again, seems fine to me to supply a “Yes, in the following sense:”):
I think that obvious-to-me future outgrowths of modern ML paradigms are extremely liable to, if they can learn how to do sufficiently superhuman X, generalize to taking over the world. How fast this happens does depend on X. It would plausibly happen relatively slower (at higher levels) with theorem-proving as the X, and with architectures that carefully stuck to gradient-descent-memorization over shallow network architectures to do a pattern-recognition part with search factored out (sort of, this is not generally safe, this is not a general formula for safe things!); rather than imposing anything like the genetic bottleneck you validly pointed out as a reason why humans generalize. Profitable X, and all X I can think of that would actually save the world, seem much more problematic.
Expressing a thought in your own words can often be clearer than just saying “Yes” or “No”; e.g., it will make it more obvious whether you misunderstood the intended question.
- DPiepgrass 22 Jan 2022 18:47 UTC
  3 points
  Parent
  I read Eliezer’s response as basically “Yes, in the following sense”....I prefer Eliezer’s response over just saying “yes”...Expressing a thought in your own words can often be clearer than just saying “Yes” or “No”
  I would never suggest that after saying “yes”, someone should stop talking and provide no further explanation. If that’s what you thought I was advocating, I’m flabbergasted. (If his answers were limited to one word I’d complain about that instead!) Edit: to be clear, when answering yes-no questions, I urge everyone to say “yes” or “no” or otherwise indicate which way they are leaning.
  If that’s not sufficient for “IRL agenticness”, then I’m not sure what would be sufficient or why it matters
  No, by agenticness I mean that the intelligence both “desires” and “tries” to carry out the plans it generates. Specifically, it (1) searches for plans that are detailed enough to implement (not just broad-strokes or limited to a simplified world-model), (2) can and does try to find plans that maximize the probability that a plan is carried out, NOT JUST the probability that the plan succeeds conditional upon the plan being carried out (IOW the original plan is “wrapped” in another plan in order to increase the probability of the original plan happening, e.g. “lie to the analyst who is listening to me, in the hope of increasing the chance he carries out my plan”) (3) tends to actually carry out plans thus discovered.
  While (2) is the key part, an AGI doesn’t seem world-ending without (3).
  This ‘agenticness’ seems to me like the most dangerous part of an AGI, so I’d expect it to be a well-known focal point of AGI risk conversations. But maybe you have a dramatically different understanding of the risks than I do, which would account for your idea of ‘agenticness’ being very different from mine?
  The term ‘pivotal act’ in the context of AI alignment theory is a guarded term to refer to actions that will make a large positive difference a billion years later.
  Wow, that’s grandiose. To me, it makes more sense to just explore the problem like we would any other problem. You won’t make a large positive difference a billion years later without doing the ordinary, universal-type work of thinking through the problem. My impression of the conversation was that, maybe, Ngo was doing that ordinary work of talking about how to think about AGIs, while EY skipped past that entire question and jumped straight into more advanced territory, like “how do we make an AGI that solves the alignment problem” or something.
  Granted Ngo seemed to follow EY’s musings better than I did, so I’m probably just not getting what EY was saying. Which is, of course, part of my complaint: I think he’s capable of explaining things more clearly, and doesn’t.