I purposefully use these terms vaguely since my concepts about them are in fact vague. E.g., when I say “alignment” I am referring to something roughly like “the AI wants what we want.” But what is “wanting,” and what does it mean for something far more powerful to conceptualize that wanting in a similar way, and what might wanting mean as a collective, and so on? All of these questions are very core to what it means for an AI system to be “aligned,” yet I don’t have satisfying or precise answers for any of them. So it seems more natural to me, at this stage of scientific understanding, to simply own that—not to speak more rigorously than is in fact warranted, not to pretend I know more than I in fact do.
The goal, of course, is to eventually understand minds well enough to be more precise. But before we get there, most precision will likely be misguided—formalizing the wrong thing, or missing the key relationships, or what have you. And I think this does more harm than good, as it causes us to collectively misplace where the remaining confusion lives, when locating our confusion is (imo) one of the major bottlenecks to solving alignment.
I purposefully use these terms vaguely since my concepts about them are in fact vague. E.g., when I say “alignment” I am referring to something roughly like “the AI wants what we want.” But what is “wanting,” and what does it mean for something far more powerful to conceptualize that wanting in a similar way, and what might wanting mean as a collective, and so on? All of these questions are very core to what it means for an AI system to be “aligned,” yet I don’t have satisfying or precise answers for any of them. So it seems more natural to me, at this stage of scientific understanding, to simply own that—not to speak more rigorously than is in fact warranted, not to pretend I know more than I in fact do.
The goal, of course, is to eventually understand minds well enough to be more precise. But before we get there, most precision will likely be misguided—formalizing the wrong thing, or missing the key relationships, or what have you. And I think this does more harm than good, as it causes us to collectively misplace where the remaining confusion lives, when locating our confusion is (imo) one of the major bottlenecks to solving alignment.