You choose phrases like “help to solve alignment”, in general mostly mention “alignment” and not “safety” (except in the sections where you discuss indirect agendas, such as “7. Facilitate the development of explicitly-safety-focused businesses”), and write “if/when we live in a world with superintelligent AI whose behavior is—likely by definition—outside our direct control” (implying that ‘control’ of AI would be desirable?).
Is this adeliberate choiceof narrowing your direct, object-level technical work to alignment (because you think this where the predispositions of your team are?), or adisagreementwith more systemic views on “what we should work on to reduce the AI risks”, such as:
For me the core question of existential safety is this: “Under these conditions, what would be the best strategy for building an AI system that helps us ethically end the acute risk period without creating its own catastrophic risks that would be worse than the status quo?”
It is not, for example, “how can we build an AI that is aligned with human values, including all that is good and beautiful?” or “how can we build an AI that optimises the world for whatever the operators actually specified?” Those could be useful subproblems, but they are not the top-level problem about AI risk (and, in my opinion, given current timelines and a quasi-worst-case assumption, they are probably not on the critical path at all).
Note that in this post, only a relatively narrow aspect of the multi-disciplinary view on AI safety is considered, namely the aspect of poly-theoretical approach to the technical alignment of humans to AIs. This mainly speaks to theories of cognition (intelligence, alignment) and ethics. But on a larger view, there are more theories and approaches that should be deployed in order to engineer our civilisational intelligence such that it “goes well”. These theories are not necessarily quite about “alignment”. Examples are control theory (we may be “aligned” with AIs but collectively “zombified” by powerful memetic viruses and walk towards a civilisational cliff), game theory (we may have good theories of alignment but our governance systems cannot deal with multi-polar traps so we cannot deploy these theories effectively), information security considerations, mechanistic anomaly detection and deep deceptiveness, etc. All these perspectives further demonstrate that no single compact theory can “save” us.
Thanks for your comment! I think we can simultaneously (1) strongly agree with the premise that in order for AGI to go well (or at the very least, not catastrophically poorly), society needs to adopt a multidisciplinary, multipolar approach that takes into account broader civilizational risks and pitfalls, and (2) have fairly high confidence that within the space of all possible useful things to do to within this broader scope, the list of neglected approaches we present above does a reasonable job of documenting some of the places where we specifically think AE has comparative advantage/the potential to strongly contribute over relatively short time horizons. So, to directly answer:
Is this adeliberate choiceof narrowing your direct, object-level technical work to alignment (because you think this where the predispositions of your team are?), or adisagreementwith more systemic views on “what we should work on to reduce the AI risks?”
It is something far more like a deliberate choice than a systemic disagreement. We are also very interested and open to broader models of how control theory, game theory, information security, etc have consequences for alignment (e.g., see ideas 6 and 10 for examples of nontechnical things we think we could likely help with). To the degree that these sorts of things can be thought of further neglected approaches, we may indeed agree that they are worthwhile for us to consider pursuing or at least help facilitate others’ pursuits—with the comparative advantage caveat stated previously.
You choose phrases like “help to solve alignment”, in general mostly mention “alignment” and not “safety” (except in the sections where you discuss indirect agendas, such as “7. Facilitate the development of explicitly-safety-focused businesses”), and write “if/when we live in a world with superintelligent AI whose behavior is—likely by definition—outside our direct control” (implying that ‘control’ of AI would be desirable?).
Is this a deliberate choice of narrowing your direct, object-level technical work to alignment (because you think this where the predispositions of your team are?), or a disagreement with more systemic views on “what we should work on to reduce the AI risks”, such as:
(1) Davidad’s “AI Neorealism: a threat model & success criterion for existential safety”:
(2) Leventov’s “Beyond alignment theories”:
(3) Drexler’s “Open Agency Model”;
(4) Hendrycks’ “Pragmatic AI Safety”;
(5) Critch’s “What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)”.
Thanks for your comment! I think we can simultaneously (1) strongly agree with the premise that in order for AGI to go well (or at the very least, not catastrophically poorly), society needs to adopt a multidisciplinary, multipolar approach that takes into account broader civilizational risks and pitfalls, and (2) have fairly high confidence that within the space of all possible useful things to do to within this broader scope, the list of neglected approaches we present above does a reasonable job of documenting some of the places where we specifically think AE has comparative advantage/the potential to strongly contribute over relatively short time horizons. So, to directly answer:
It is something far more like a deliberate choice than a systemic disagreement. We are also very interested and open to broader models of how control theory, game theory, information security, etc have consequences for alignment (e.g., see ideas 6 and 10 for examples of nontechnical things we think we could likely help with). To the degree that these sorts of things can be thought of further neglected approaches, we may indeed agree that they are worthwhile for us to consider pursuing or at least help facilitate others’ pursuits—with the comparative advantage caveat stated previously.