Roman Leventov answers Can someone explain to me why most researchers think alignment is probably something that is humanly tractable?

Roman Leventov 3 Sep 2022 11:49 UTC
2 points
−2
(Disclaimer: below are my thoughts from a fairly pessimistic perspective, upon reading quite a lot of LW recently, but not knowing almost any researcher personally.)
If someone is worried about AI x-risk, but is also not simultaneously a doomer, then I do not know what they are hoping for
I think they also don’t know what they hope for. Or they do hope for a success of a particular agenda, such as those enumerated in the recent Nate Soares’ post. They may also not believe that any of the agendas they are familiar with will lead to the long-term preservation of flourishing human consciousness, even if the agenda technically succeeds as stated, because of unforeseen deficiencies of the alignment scheme, incalculable (or calculable, but inextricable) second- and third-order effects, systemic problems (Cristiano’s “first scenario”), coordination or political problems, inherent attack-defence imbalance (see Bostrom’s vulnerable world hypothesis), etc. Nevertheless, they may still be “vaguely hopeful” out of general optimistic bias (or you may call this a physiological defence reaction in a snarky mood).
This is actually an interesting situation we are in, that we don’t know what do we hope for (or at least don’t agree about this—discussion around the Pivotal Act is an example of such a disagreement). We should imagine in enough detail a particular future world state, and then try to steer ourselves towards it. However, it’s very hard, maybe even inhumanly hard to think of a world with AGI which would actually behave in real life the way we imagine it will (or will be achievable from the current world state and its dynamics), because of the unknown unknowns, the limits of our world modelling capacity, and the out-of-distribution generalisation (AGI is out of distribution) that we need to perform.
Absent such a world picture with a wide consensus around it, we are driving blind, not in control of the situation, but the dynamics of the situation controlling us.