Did he really speak that little about AI Alignment/Safety? Does anyone have additional recollections on this topic?
The only relevant parts so far seem to be these two:
Behavioral cloning probably much safer than evolving a bunch of agents. We can tell GPT to be empathic.
And:
Chat access for alignment helpers might happen.
Both of which are very concerning.
“We can tell GPT to be empathetic” assumes it can be aligned in the first place so you “can tell” it what to do, and “be empathetic” is a very vague description of what a good utility function would be assuming one would be followed at all. Of course it’s all in conversational tone, not a formal paper, but it seems very dismissive to me.
GPT-based “behavioral cloning” itself has been brought up by Vitalik Buterin and criticized by Eliezer Yudkowsky in this exchange between the two:
For concreteness: One can see how AlphaFold 2 is working up towards world-ending capability. If you ask how you could integrate an AF2 setup with GPT-3 style human imitation, to embody the human desire for proteins that do nice things… the answer is roughly “Lol, what? No.”
As for “chat access for alignment helpers,” I mean, where to even begin? It’s not hard to imagine a deceptive AI using this chat to perfectly convince human “alignment helpers” that it is whatever they want it to be while being something else entirely. Or even “aligning” the human helpers themselves into beliefs/actions that are in the AI’s best interest.
As a general point, these notes should not be used to infer anything about what Sam Altman thought was important enough to talk a lot about, or what his general tone/attitude was. This is because
The notes are filtered through what the note-takers thought was important. There’s a lot of stuff that’s missing.
What Sam spoke about was mostly a function of what he was asked about (it was a Q&A after all). If you were there live you could maybe get some idea of how he was inclined to interpret questions, what he said in response to more open questions, etc. But here, the information about what questions were asked is entirely missing.
General attitude/tone is almost completely destroyed by the compression of answers into notes.
For example, IIRC, the thing about GPT being empathic was in response to some question like “How can we make AI empathic?” (i.e., it was not his own idea to bring up empathy). The answer was obviously much longer than the summary by notes (so less dismissive). And directionally, it is certainly the case already that GPT-3 will act more empathic if you tell it to do so.
Did he really speak that little about AI Alignment/Safety? Does anyone have additional recollections on this topic?
He did make some general claims that it was one of his top few concerns, that he felt like OpenAI had been making some promising alignment work over the last year, that it was still an important goal for OpenAI’s safety work to catch up with its capabilities work, that it was good for more people to go into safety work, etc. Not very many specifics as far as I can remember.
Did he really speak that little about AI Alignment/Safety? Does anyone have additional recollections on this topic?
The only relevant parts so far seem to be these two:
And:
Both of which are very concerning.
“We can tell GPT to be empathetic” assumes it can be aligned in the first place so you “can tell” it what to do, and “be empathetic” is a very vague description of what a good utility function would be assuming one would be followed at all. Of course it’s all in conversational tone, not a formal paper, but it seems very dismissive to me.
GPT-based “behavioral cloning” itself has been brought up by Vitalik Buterin and criticized by Eliezer Yudkowsky in this exchange between the two:
As for “chat access for alignment helpers,” I mean, where to even begin? It’s not hard to imagine a deceptive AI using this chat to perfectly convince human “alignment helpers” that it is whatever they want it to be while being something else entirely. Or even “aligning” the human helpers themselves into beliefs/actions that are in the AI’s best interest.
As a general point, these notes should not be used to infer anything about what Sam Altman thought was important enough to talk a lot about, or what his general tone/attitude was. This is because
The notes are filtered through what the note-takers thought was important. There’s a lot of stuff that’s missing.
What Sam spoke about was mostly a function of what he was asked about (it was a Q&A after all). If you were there live you could maybe get some idea of how he was inclined to interpret questions, what he said in response to more open questions, etc. But here, the information about what questions were asked is entirely missing.
General attitude/tone is almost completely destroyed by the compression of answers into notes.
For example, IIRC, the thing about GPT being empathic was in response to some question like “How can we make AI empathic?” (i.e., it was not his own idea to bring up empathy). The answer was obviously much longer than the summary by notes (so less dismissive). And directionally, it is certainly the case already that GPT-3 will act more empathic if you tell it to do so.
He did make some general claims that it was one of his top few concerns, that he felt like OpenAI had been making some promising alignment work over the last year, that it was still an important goal for OpenAI’s safety work to catch up with its capabilities work, that it was good for more people to go into safety work, etc. Not very many specifics as far as I can remember.