paulfchristiano comments on Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe

paulfchristiano 18 Apr 2022 21:53 UTC
9 points
It seems that at this point in time, neither Paul nor Eliezer are excited about IDA
I’m still excited about IDA.
I assume this is coming from me saying that you need big additional conceptual progress to have an indefinitely scalable scheme. And I do think that’s more skeptical than my strongest pro-IDA claim here in early 2017:
I think there is a very good chance, perhaps as high as 50%, that this basic strategy can eventually be used to train benign state-of-the-art model-free RL agents. [...] That does not mean that I think the conceptual issues are worked out conclusively, but it does mean that I think we’re at the point where we’d benefit from empirical information about what works in practice
That said:
- I think it’s up for grabs whether we’ll end up with something that counts as “this basic strategy.” (I think imitative generalization is the kind of thing I had in mind in that sentence, but many of the ELK schemes we are thinking about definitely aren’t, it’s pretty arbitrary.)
- Also note that in that post I’m talking about something that produces a benign agent in practice, and in the other I’m talking about “indefinitely scalable.” Though my probability on “produces a benign agent in practice” is also definitely lower.