Raemon comments on Some AI research areas and their relevance to existential safety

Raemon 11 Jan 2022 23:08 UTC
LW: 2 AF: 1
AF
I’ve highly voted this post for a few reasons.
First, this post contains a bunch of other individual ideas I’ve found quite helpful for orienting. Some examples:
- Useful thoughts on which term definitions have “staying power,” and are worth coordinating around.
- The zero/single/multi alignment framework.
- The details on how to anticipate legitimize and fulfill governance demands.
But my primary reason was learning Critch’s views on what research fields are promising, and how they fit into his worldview. I’m not sure if I agree with Critch, but I think “Figure out what are the best research directions to navigate towards” seems crucially important. Having senior senior AI x-risk researchers to lay out how they think about what research is valuable.
I’d like to see similar posts from Paul, Eliezer, etc, (which I expect to have radically different frames). I don’t expect everyone to end up converging on a single worldview, but I think the process of smashing the worldviews together can generate useful ideas, and give up-and-coming-researchers some hooks of what to explore.
One confusing here is that the initial table doesn’t distinguish between “fields that aren’t that helpful for existential safety” and “fields which are both helpful-and-harmful to existential safety.” I was surprised when I looked at the initial Agent Foundations ranking of “3” which turned out to be much more complex.
Some notes on worldview differences this post highlights.
disclaimer: my own rough guesses about Critch’s and MIRIs views, which may not be accurate. It’s also focusing on the differences that felt important to me, which I think are somewhat different from how Critch presents things. I’m also using “MIRI” as sort of a shorthand for “some cluster of thinking that’s common on LW”, which isn’t necessaril
My understanding of Critch’s paradigm seems fairly different from the MIRI paradigm (which AFAICT expects the first AGI mover will gain overwhelming decisive advantage, and meanwhile that interfacing with most existing power structures is… kinda a waste of time (due to them being trapped in bad equilibria that make them inadequate?).
From what I understand of Critch’s view, AGI will tend to be rolled out in smaller, less-initially-powerful pieces, and much of the danger of AGI comes from when many different AGIs start interacting with each other, and multiple humans, in ways that get increasingly hard to predict.
Therefore, it’s important for humanity as a whole to be able to think critically and govern themselves in scalable ways. I think Critch thinks it is both more tractable to get humanity to collectively govern itself, and also thinks it’s more important, which leads to more emphasis on domains like ML Fairness.
Some followup work I’d really like to see are more public discussions about the underlying worldview differences here, and the actual cruxes that generate them.
Speaking for myself (as opposed to either Critch or MIRI-esque researchers), “whether our institutions are capable of governing themselves in the face of powerful AI systems” is an important crux for what strategic directions to prioritize. BUT, I’ve found all the gears that Critch has pointed to here to be helpful for my overall modeling of the world.