It helped me understand Nate’s view. I’d read previous examples of his suggesting “cutting edge scientific work necessitates CIS pursuit” (e.g. the laser in the fog), but it wasn’t clear to me how important he considered these examples
Theories about how ML generalises play a substantial role in everyone’s thinking here. AFAIK we don’t have precise theories that are much help in this regard, and in practice people usually use imprecise theories. Just because they’re imprecise doesn’t mean they can’t be explained. Perhaps some effort into explicating theories of ML generalisation could help everyone understand one another.
As AI gets more powerful, it makes sense that AI systems will make higher and higher level decisions about the objectives that they pursue. Holden & Nate focusing on the benchmark of “needle moving scientific research” seems to suggest agreement on the following:
In order to sustain the trend of more powerful AI making higher and higher level decisions, we will need substantial innovation in our technologies for AI control
The rate of innovation possible under business as usual human science seems unlikely to keep up with this need
Thus we require AI acceleration of AI control science
Regarding this last point: it’s not clear to me whether slow progress in AI control systems will lead to slow progress in AI making higher and higher level decisions or not. That is, it’s not obvious to me that AI control systems failing to keep up necessarily leads to catastrophe. I acknowledge that very powerful AI systems may seem to work well with poor control technologies, but I’m uncertain about whether moderately powerful AI systems work well enough with poor control technologies for the very powerful systems to be produced (and also what the relevant levels of power are, compared to today’s systems).
One more thing: I’m suspicious of equivocation between “some convergent instrumental sub goals” and “worrisome convergent instrumental sub goals”. There are probably many collections of CISs that aren’t similar to the worrisome ones in the machine’s picture of the world.
And another more thing:
To be clear, I think both Nate and I are talking about a pretty “thin” version of POUDA-avoidance here, more like “Don’t do egregiously awful things” than like “Pursue the glorious transhumanist future.” Possibly Nate considers it harder to get the former without the latter than I do.
I’m still unsure how much pivotal act considerations weigh in Nate/MIRI’s views. My view is roughly:
Cutting edge scientific work without disastrous instrumental behaviour seems pretty attainable
Unilaterally preventing anyone else from building AI seems much more likely to entail disastrous instrumental behaviour
and I can easily imagine finding it difficult to be confident you’re avoiding any catastrophes if you’re aiming for the second.
A few reflections on this piece:
It helped me understand Nate’s view. I’d read previous examples of his suggesting “cutting edge scientific work necessitates CIS pursuit” (e.g. the laser in the fog), but it wasn’t clear to me how important he considered these examples
Theories about how ML generalises play a substantial role in everyone’s thinking here. AFAIK we don’t have precise theories that are much help in this regard, and in practice people usually use imprecise theories. Just because they’re imprecise doesn’t mean they can’t be explained. Perhaps some effort into explicating theories of ML generalisation could help everyone understand one another.
As AI gets more powerful, it makes sense that AI systems will make higher and higher level decisions about the objectives that they pursue. Holden & Nate focusing on the benchmark of “needle moving scientific research” seems to suggest agreement on the following:
In order to sustain the trend of more powerful AI making higher and higher level decisions, we will need substantial innovation in our technologies for AI control
The rate of innovation possible under business as usual human science seems unlikely to keep up with this need
Thus we require AI acceleration of AI control science
Regarding this last point: it’s not clear to me whether slow progress in AI control systems will lead to slow progress in AI making higher and higher level decisions or not. That is, it’s not obvious to me that AI control systems failing to keep up necessarily leads to catastrophe. I acknowledge that very powerful AI systems may seem to work well with poor control technologies, but I’m uncertain about whether moderately powerful AI systems work well enough with poor control technologies for the very powerful systems to be produced (and also what the relevant levels of power are, compared to today’s systems).
One more thing: I’m suspicious of equivocation between “some convergent instrumental sub goals” and “worrisome convergent instrumental sub goals”. There are probably many collections of CISs that aren’t similar to the worrisome ones in the machine’s picture of the world.
And another more thing:
I’m still unsure how much pivotal act considerations weigh in Nate/MIRI’s views. My view is roughly:
Cutting edge scientific work without disastrous instrumental behaviour seems pretty attainable
Unilaterally preventing anyone else from building AI seems much more likely to entail disastrous instrumental behaviour
and I can easily imagine finding it difficult to be confident you’re avoiding any catastrophes if you’re aiming for the second.