DPiepgrass comments on Ngo and Yudkowsky on alignment difficulty

DPiepgrass 16 Jan 2022 6:27 UTC
3 points
Ngo: Is this a crux for you?
EY could have said yes or no, but instead we get
EY: I would certainly have learned very new and very exciting facts about intelligence, facts which indeed contradict my present model of how intelligences liable to be discovered by present research paradigms work, if you showed me… how can I put this in a properly general way… that problems I thought were about searching for states that get fed into a result function and then a result-scoring function, such that the input gets an output with a high score, were in fact not about search problems like that.
Later...
Ngo: So then my position is something like: human pursuit of goals is driven by emotions and reward signals which are deeply evolutionarily ingrained, and without those we’d be much safer but not that much worse at pattern recognition.
EY could have simply agreed, or disagreed and poked a hole in his model, but instead we get
EY: If there’s a pivotal act you can get just by supreme acts of pattern recognition, that’s right up there with “pivotal act composed solely of math” for things that would obviously instantly become the prime direction of research.
I wish EY could stop saying “pivotal act” long enough to talk about why he thinks intelligence implies an urge for IRL agenticness. Or at least, define the term “pivotal act” and explain why he says it so much. Moving on...
Ngo: Okay, so if I attempt to rephrase your argument: Your position: There’s a set of fundamental similarities between tasks like doing maths, doing alignment research, and taking over the world. In all of these cases, agents based on techniques similar to modern ML which are very good at them will need to make use of deep problem-solving patterns which include goal-oriented reasoning. So while it’s possible to beat humans at some of these tasks without those core competencies, people usually overestimate the extent to which that’s possible.
Once again Yudkowsky could have agreed or disagreed or corrected, but confusingly chooses “none of the above”:
Yudkowsky: Remember, a lot of my concern is about what happens first, especially if it happens soon enough that future AGI bears any resemblance whatsoever to modern ML; not about what can be done in principle.
And then there’s this:
Ngo: Maybe this is a good time to dig into the details of what they have in common, then.
EY: I feel like I haven’t had much luck with trying to explain that on previous occasions. Not to you, to others too.
He then proceeds to… not try to explain these key points (at least not at first; I can’t be bothered to read to the end).
This is an uncomfortable discussion. It’s odd that the same EY who was … not perfect by any means, but adept enough at explaining things in Rationality:A-Z and HPMOR is unable to explain his main area of expertise, Friendly AI, to another AI expert. I’m puzzled how such a prolific writer can be bad at this. But if you’re reading this EY, please use phrases like “yes”, “no”, “I mostly agree/disagree”, etc as applicable. Also, please take lessons in communication from your younger self, Scott Alexander, etc. And drop some links to background information for us less-expert readers.
- Rob Bensinger 21 Jan 2022 0:38 UTC
  2 points
  Parent
  EY could have said yes or no, but instead we get
  I read Eliezer’s response as basically “Yes, in the following sense: I would certainly have learned very new and very exciting facts about intelligence...”
  I prefer Eliezer’s response over just saying “yes”, because there’s ambiguity in what it means to be a “crux” here, and because “agentic” in Richard’s question is an unclear term.
  I wish EY could stop saying “pivotal act” long enough to talk about why he thinks intelligence implies an urge for IRL agenticness.
  I don’t know what you mean by “intelligence” or “an urge for IRL agenticness” here, but I think the basic argument for ‘sufficiently smart and general AI will behave as though it is consistently pursuing goals in the physical world’ is that sufficiently smart and general AI will (i) model the physical world, (ii) model chains of possible outcomes in the physical world, and (iii) be able to search for policies that make complex outcomes much more or less likely. If that’s not sufficient for “IRL agenticness”, then I’m not sure what would be sufficient or why it matters (for thinking about the core things that make AGI dangerous, or make it useful).
  Talking about pivotal acts then clarifies what threshold of “sufficiently smart” actually matters for practical purposes. If there’s some threshold where AI becomes smart and general enough to be “in-real-life-agentic”, but this threshold is high above the level needed for pivotal acts, then we mostly don’t have to worry about “in-real-life agenticness”.
  Or at least, define the term “pivotal act” and explain why he says it so much.
  Here’s an explanation: https://arbital.com/p/pivotal/
  Once again Yudkowsky could have agreed or disagreed or corrected, but confusingly chooses “none of the above”:
  What do you find confusing about it? Eliezer is saying that he’s not making a claim about what’s possible in principle, just about what’s likely to be reached by the first AGI developers. He then answers the question here (again, seems fine to me to supply a “Yes, in the following sense:”):
  I think that obvious-to-me future outgrowths of modern ML paradigms are extremely liable to, if they can learn how to do sufficiently superhuman X, generalize to taking over the world. How fast this happens does depend on X. It would plausibly happen relatively slower (at higher levels) with theorem-proving as the X, and with architectures that carefully stuck to gradient-descent-memorization over shallow network architectures to do a pattern-recognition part with search factored out (sort of, this is not generally safe, this is not a general formula for safe things!); rather than imposing anything like the genetic bottleneck you validly pointed out as a reason why humans generalize. Profitable X, and all X I can think of that would actually save the world, seem much more problematic.
  Expressing a thought in your own words can often be clearer than just saying “Yes” or “No”; e.g., it will make it more obvious whether you misunderstood the intended question.
  - DPiepgrass 22 Jan 2022 18:47 UTC
    3 points
    Parent
    I read Eliezer’s response as basically “Yes, in the following sense”....I prefer Eliezer’s response over just saying “yes”...Expressing a thought in your own words can often be clearer than just saying “Yes” or “No”
    I would never suggest that after saying “yes”, someone should stop talking and provide no further explanation. If that’s what you thought I was advocating, I’m flabbergasted. (If his answers were limited to one word I’d complain about that instead!) Edit: to be clear, when answering yes-no questions, I urge everyone to say “yes” or “no” or otherwise indicate which way they are leaning.
    If that’s not sufficient for “IRL agenticness”, then I’m not sure what would be sufficient or why it matters
    No, by agenticness I mean that the intelligence both “desires” and “tries” to carry out the plans it generates. Specifically, it (1) searches for plans that are detailed enough to implement (not just broad-strokes or limited to a simplified world-model), (2) can and does try to find plans that maximize the probability that a plan is carried out, NOT JUST the probability that the plan succeeds conditional upon the plan being carried out (IOW the original plan is “wrapped” in another plan in order to increase the probability of the original plan happening, e.g. “lie to the analyst who is listening to me, in the hope of increasing the chance he carries out my plan”) (3) tends to actually carry out plans thus discovered.
    While (2) is the key part, an AGI doesn’t seem world-ending without (3).
    This ‘agenticness’ seems to me like the most dangerous part of an AGI, so I’d expect it to be a well-known focal point of AGI risk conversations. But maybe you have a dramatically different understanding of the risks than I do, which would account for your idea of ‘agenticness’ being very different from mine?
    The term ‘pivotal act’ in the context of AI alignment theory is a guarded term to refer to actions that will make a large positive difference a billion years later.
    Wow, that’s grandiose. To me, it makes more sense to just explore the problem like we would any other problem. You won’t make a large positive difference a billion years later without doing the ordinary, universal-type work of thinking through the problem. My impression of the conversation was that, maybe, Ngo was doing that ordinary work of talking about how to think about AGIs, while EY skipped past that entire question and jumped straight into more advanced territory, like “how do we make an AGI that solves the alignment problem” or something.
    Granted Ngo seemed to follow EY’s musings better than I did, so I’m probably just not getting what EY was saying. Which is, of course, part of my complaint: I think he’s capable of explaining things more clearly, and doesn’t.