it seems very clear that we should update that structure to the best of our ability as we make progress in understanding the challenges and potentials of different approaches.
Definitely agree—I hope this sequence is read as something much more like a dynamic draft of a theoretical framework than my Permanent Thoughts on Paradigms for AGI Safety™.
“Aiming at good outcomes while/and avoiding bad outcomes” captures more conceptual territory, while still allowing for the investigation to turn out that avoiding bad outcomes is more difficult and should be prioritised. This extends to the meta-question of whether existential risk can be best adressed by focusing on avoiding bad outcomes, rather than developing a strategy to get to good outcomes (which are often characterised by a better abilitiy to deal with future risks) and avoid bad outcomes on the way there.
I definitely agree with the value of framing AGI outcomes both positively and negatively, as I discuss in the previous post. I am less sure that AGI safety as a field necessarily requires deeply considering the positive potential of AGI (i.e., as long as AGI-induced existential risks are avoided, I think AGI safety researchers can consider their venture successful), but, much to your point, if the best way of actually achieving this outcome is by thinking about AGI more holistically—e.g., instead of explicitly avoiding existential risks, we might ask how to build an AGI that we would want to have around—then I think I would agree. I just think this sort of thing would radically redefine the relevant approaches undertaken in AGI safety research. I by no means want to reject radical redefinitions out of hand (I think this very well could be correct); I just want to say that it is probably not the path of least resistance given where the field currently stands.
Hey Robert—thanks for your comment!
Definitely agree—I hope this sequence is read as something much more like a dynamic draft of a theoretical framework than my Permanent Thoughts on Paradigms for AGI Safety™.
I definitely agree with the value of framing AGI outcomes both positively and negatively, as I discuss in the previous post. I am less sure that AGI safety as a field necessarily requires deeply considering the positive potential of AGI (i.e., as long as AGI-induced existential risks are avoided, I think AGI safety researchers can consider their venture successful), but, much to your point, if the best way of actually achieving this outcome is by thinking about AGI more holistically—e.g., instead of explicitly avoiding existential risks, we might ask how to build an AGI that we would want to have around—then I think I would agree. I just think this sort of thing would radically redefine the relevant approaches undertaken in AGI safety research. I by no means want to reject radical redefinitions out of hand (I think this very well could be correct); I just want to say that it is probably not the path of least resistance given where the field currently stands.
(And agreed on the self-control point, as you know. See directionality of control in Q3.)