DPiepgrass comments on Ngo and Yudkowsky on alignment difficulty

DPiepgrass 22 Jan 2022 18:47 UTC
3 points
I read Eliezer’s response as basically “Yes, in the following sense”....I prefer Eliezer’s response over just saying “yes”...Expressing a thought in your own words can often be clearer than just saying “Yes” or “No”
I would never suggest that after saying “yes”, someone should stop talking and provide no further explanation. If that’s what you thought I was advocating, I’m flabbergasted. (If his answers were limited to one word I’d complain about that instead!) Edit: to be clear, when answering yes-no questions, I urge everyone to say “yes” or “no” or otherwise indicate which way they are leaning.
If that’s not sufficient for “IRL agenticness”, then I’m not sure what would be sufficient or why it matters
No, by agenticness I mean that the intelligence both “desires” and “tries” to carry out the plans it generates. Specifically, it (1) searches for plans that are detailed enough to implement (not just broad-strokes or limited to a simplified world-model), (2) can and does try to find plans that maximize the probability that a plan is carried out, NOT JUST the probability that the plan succeeds conditional upon the plan being carried out (IOW the original plan is “wrapped” in another plan in order to increase the probability of the original plan happening, e.g. “lie to the analyst who is listening to me, in the hope of increasing the chance he carries out my plan”) (3) tends to actually carry out plans thus discovered.
While (2) is the key part, an AGI doesn’t seem world-ending without (3).
This ‘agenticness’ seems to me like the most dangerous part of an AGI, so I’d expect it to be a well-known focal point of AGI risk conversations. But maybe you have a dramatically different understanding of the risks than I do, which would account for your idea of ‘agenticness’ being very different from mine?
The term ‘pivotal act’ in the context of AI alignment theory is a guarded term to refer to actions that will make a large positive difference a billion years later.
Wow, that’s grandiose. To me, it makes more sense to just explore the problem like we would any other problem. You won’t make a large positive difference a billion years later without doing the ordinary, universal-type work of thinking through the problem. My impression of the conversation was that, maybe, Ngo was doing that ordinary work of talking about how to think about AGIs, while EY skipped past that entire question and jumped straight into more advanced territory, like “how do we make an AGI that solves the alignment problem” or something.
Granted Ngo seemed to follow EY’s musings better than I did, so I’m probably just not getting what EY was saying. Which is, of course, part of my complaint: I think he’s capable of explaining things more clearly, and doesn’t.