Let me first say what I think alignment (or “superalignment”) actually requires. This is under the assumption that humanity’s AI adventure issues in a superintelligence that dominates everything, and that the problem to be solved is how to make such an entity compatible with human existence and transhuman flourishing. If you think the future will always be a plurality of posthuman entities, including enhanced former humans, with none ever gaining an irrevocable upper hand (e.g. this seems to be one version of e/acc); or if you think the whole race towards AI is folly and needs to be stopped entirely; then you may have a different view.
I have long thought of a benevolent superintelligence as requiring three things: superintelligent problem-solving ability; the correct “value system” (or “decision procedure”, etc); and a correct ontology (and/or the ability to improve its ontology). The first two criteria would not be surprising, in the small world of AI safety that existed before the deep learning revolution. They fit a classic agent paradigm like the expected utility maximizer; alignment (or Friendliness, as we used to say), being a matter of identifying the right utility function.
The third criterion is a little unconventional, and my main motive for it even more so, in that I don’t believe the theories of consciousness and identity that would reduce everything to “computation”. I think they (consciousness and identity) are grounded in “Being” or “substance”, in a way that the virtual state machines of computation are not; that there really is a difference between a mind and a simulation of a mind, for example. This inclines me to think that quantum holism is part of the physics of mind, but that thinking of it just as physics is not enough, you need a richer ontology of which physics is only a formal description; but these are more like the best ideas I’ve had, than something I am absolutely sure is true. I am much more confident that purely computational theories of consciousness are radically incomplete, than as to what the correct alternative paradigm is.
The debate about whether the fashionable reductionist theory of the day is correct, is as old as science. What does AI add to the mix? On the one hand, there is the possibility that an AI with the “right” value system but the wrong ontology, might do something intended as benevolent, that misses the mark because it misidentifies something about personhood. (A simple example of this might be, that it “uploads” everyone to a better existence, but uploads aren’t actually conscious, they are just simulations.) On the other hand, one might also doubt the AI’s ability to discover that the ontology of mind, according to which uploads are conscious, is wrong, especially if the AI itself isn’t conscious. If it is superintelligent, it may be able to discover a mismatch between standard human concepts of mind, extrapolated in a standard way, and how reality actually works; but lacking consciousness itself, it might also lack some essential inner guidance on how the mismatch is to be corrected.
This is just one possible story about what we could call a philosophical error in the AI’s cognition and/or the design process that produced it. I think it’s an example of why Wei Dai regards metaphilosophy as an important issue for alignment. Metaphilosophy is the (mostly philosophical) study of philosophy, and includes questions like, what is philosophical thought, what characterizes correct philosophical thought, and, how do you implement correct philosophical thought in an AI? Metaphilosophical concerns go beyond my third criterion, of getting ontology of mind correct; philosophy could also have something to say about problem-solving and about correct values, and even about the entire three-part approach to alignment with which I began.
So perhaps I will revise my superalignment schema and say: a successful plan for superalignment needs to produce problem-solving superintelligence (since the superaligned AI is useless if it gets trampled by a smarter unaligned AI), a sufficiently correct “value system” (or decision procedure or utility function), and some model of metaphilosophical cognition (with particular attention to ontology of mind).
Let me first say what I think alignment (or “superalignment”) actually requires. This is under the assumption that humanity’s AI adventure issues in a superintelligence that dominates everything, and that the problem to be solved is how to make such an entity compatible with human existence and transhuman flourishing. If you think the future will always be a plurality of posthuman entities, including enhanced former humans, with none ever gaining an irrevocable upper hand (e.g. this seems to be one version of e/acc); or if you think the whole race towards AI is folly and needs to be stopped entirely; then you may have a different view.
I have long thought of a benevolent superintelligence as requiring three things: superintelligent problem-solving ability; the correct “value system” (or “decision procedure”, etc); and a correct ontology (and/or the ability to improve its ontology). The first two criteria would not be surprising, in the small world of AI safety that existed before the deep learning revolution. They fit a classic agent paradigm like the expected utility maximizer; alignment (or Friendliness, as we used to say), being a matter of identifying the right utility function.
The third criterion is a little unconventional, and my main motive for it even more so, in that I don’t believe the theories of consciousness and identity that would reduce everything to “computation”. I think they (consciousness and identity) are grounded in “Being” or “substance”, in a way that the virtual state machines of computation are not; that there really is a difference between a mind and a simulation of a mind, for example. This inclines me to think that quantum holism is part of the physics of mind, but that thinking of it just as physics is not enough, you need a richer ontology of which physics is only a formal description; but these are more like the best ideas I’ve had, than something I am absolutely sure is true. I am much more confident that purely computational theories of consciousness are radically incomplete, than as to what the correct alternative paradigm is.
The debate about whether the fashionable reductionist theory of the day is correct, is as old as science. What does AI add to the mix? On the one hand, there is the possibility that an AI with the “right” value system but the wrong ontology, might do something intended as benevolent, that misses the mark because it misidentifies something about personhood. (A simple example of this might be, that it “uploads” everyone to a better existence, but uploads aren’t actually conscious, they are just simulations.) On the other hand, one might also doubt the AI’s ability to discover that the ontology of mind, according to which uploads are conscious, is wrong, especially if the AI itself isn’t conscious. If it is superintelligent, it may be able to discover a mismatch between standard human concepts of mind, extrapolated in a standard way, and how reality actually works; but lacking consciousness itself, it might also lack some essential inner guidance on how the mismatch is to be corrected.
This is just one possible story about what we could call a philosophical error in the AI’s cognition and/or the design process that produced it. I think it’s an example of why Wei Dai regards metaphilosophy as an important issue for alignment. Metaphilosophy is the (mostly philosophical) study of philosophy, and includes questions like, what is philosophical thought, what characterizes correct philosophical thought, and, how do you implement correct philosophical thought in an AI? Metaphilosophical concerns go beyond my third criterion, of getting ontology of mind correct; philosophy could also have something to say about problem-solving and about correct values, and even about the entire three-part approach to alignment with which I began.
So perhaps I will revise my superalignment schema and say: a successful plan for superalignment needs to produce problem-solving superintelligence (since the superaligned AI is useless if it gets trampled by a smarter unaligned AI), a sufficiently correct “value system” (or decision procedure or utility function), and some model of metaphilosophical cognition (with particular attention to ontology of mind).