Some things that seem important to distinguish here:
‘Prosaic alignment is doomed’. I parse this as: ‘Aligning AGI, without coming up with any fundamentally new ideas about AGI/intelligence or discovering any big “unknown unknowns” about AGI/intelligence, is doomed.’
I (and my Eliezer-model) endorse this, in large part because ML (as practiced today) produces such opaque and uninterpretable models. My sense is that Eliezer’s hopes largely route through understanding AGI systems’ internals better, rather than coming up with cleverer ways to apply external pressures to a black box.
‘All alignment work that involves running experiments on deep nets is doomed’.
My Eliezer-model doesn’t endorse this at all.
Also important to distinguish, IMO (making up the names here):
A strong ‘prosaic AGI’ thesis, like ‘AGI will just be GPT-n or some other scaled-up version of current systems’. Eliezer is extremely skeptical of this.
A weak ‘prosaic AGI’ thesis, like ‘AGI will involve coming up with new techniques, but the path between here and AGI won’t involve any fundamental paradigm shifts and won’t involve us learning any new deep things about intelligence’. I’m not sure what Eliezer’s unconditional view on this is, but I’d guess that he thinks this falls a lot in probability if we condition on something like ‘good outcomes are possible’—it’s very bad news.
An ‘unprosaic but not radically different AGI’ thesis, like ‘AGI might involve new paradigm shifts and/or new deep insights into intelligence, but it will still be similar enough to the current deep learning paradigm that we can potentially learn important stuff about alignable AGI by working with deep nets today’. I don’t think Eliezer has a strong view on this, though I observe that he thinks some of the most useful stuff humanity can do today is ‘run various alignment experiments on deep nets’.
An ‘AGI won’t be GOFAI’ thesis. Eliezer strongly endorses this.
There’s also an ‘inevitability thesis’ that I think is a crux here: my Eliezer-model thinks there are a wide variety of ways to build AGI that are very different, such that it matters a lot which option we steer toward (and various kinds of ‘prosaicness’ might be one parameter we can intervene on, rather than being a constant). My Paul-model has the opposite view, and endorses some version of inevitability.
Your comment and Vaniver’s (paraphrasing) “not surprised by the results of this work, so why do it?” especially helpful. EY (or others) assessing concrete research directions with detailed explanations would be even more helpful.
I agree with Rohin’s general question of “Can you tell a story where your research helps solve a specific alignment problem?”, and if you have other heuristics when assessing research, that would be good to know.
Some things that seem important to distinguish here:
‘Prosaic alignment is doomed’. I parse this as: ‘Aligning AGI, without coming up with any fundamentally new ideas about AGI/intelligence or discovering any big “unknown unknowns” about AGI/intelligence, is doomed.’
I (and my Eliezer-model) endorse this, in large part because ML (as practiced today) produces such opaque and uninterpretable models. My sense is that Eliezer’s hopes largely route through understanding AGI systems’ internals better, rather than coming up with cleverer ways to apply external pressures to a black box.
‘All alignment work that involves running experiments on deep nets is doomed’.
My Eliezer-model doesn’t endorse this at all.
Also important to distinguish, IMO (making up the names here):
A strong ‘prosaic AGI’ thesis, like ‘AGI will just be GPT-n or some other scaled-up version of current systems’. Eliezer is extremely skeptical of this.
A weak ‘prosaic AGI’ thesis, like ‘AGI will involve coming up with new techniques, but the path between here and AGI won’t involve any fundamental paradigm shifts and won’t involve us learning any new deep things about intelligence’. I’m not sure what Eliezer’s unconditional view on this is, but I’d guess that he thinks this falls a lot in probability if we condition on something like ‘good outcomes are possible’—it’s very bad news.
An ‘unprosaic but not radically different AGI’ thesis, like ‘AGI might involve new paradigm shifts and/or new deep insights into intelligence, but it will still be similar enough to the current deep learning paradigm that we can potentially learn important stuff about alignable AGI by working with deep nets today’. I don’t think Eliezer has a strong view on this, though I observe that he thinks some of the most useful stuff humanity can do today is ‘run various alignment experiments on deep nets’.
An ‘AGI won’t be GOFAI’ thesis. Eliezer strongly endorses this.
There’s also an ‘inevitability thesis’ that I think is a crux here: my Eliezer-model thinks there are a wide variety of ways to build AGI that are very different, such that it matters a lot which option we steer toward (and various kinds of ‘prosaicness’ might be one parameter we can intervene on, rather than being a constant). My Paul-model has the opposite view, and endorses some version of inevitability.
Note: GOFAI = Good Old Fashioned AI
Your comment and Vaniver’s (paraphrasing) “not surprised by the results of this work, so why do it?” especially helpful. EY (or others) assessing concrete research directions with detailed explanations would be even more helpful.
I agree with Rohin’s general question of “Can you tell a story where your research helps solve a specific alignment problem?”, and if you have other heuristics when assessing research, that would be good to know.