Bostroms definition of the control problem in ‘Superintelligence’ only refer to “harming the projects interests”, which you are right is broader than existential risk. However, the immediate context makes it clear that Bostrom is discussing existential risk. The “harm” referred to does not include things like gender bias.
On reflection, I don’t actually believe that AI Alignment has ever exclusively referred to existential risk from AI. I do believe that talk about “AI Alignment” on LessWrong has usually primarily been about existential risk. I further think that the distinction from “Value Alignment” (and if that is related to existential risk) has been muddled and debated.
I think the term “The Alignment Problem” is used because this community agrees that one problem (not killing everyone) is far and away more central than the rest (e.g. designing an AI to refuse to tell you how to make drugs).
Apart from the people here from OpenAI/DeepMind/etc, I expect general agreement that the task “Getting GPT to better understand and follow instructions” is not AI Alignment, but AI Capability. Note that I am moving my goalpost from defending the claim “AI Alignment = X-Risk” to defending “Some of the things OpenAI call AI Alignment is not AI Alignment”.
At this point I should repeat my disclaimer that all of this is my impression, and not backed by anything rigorous. Thank you for engaging anyway—I enjoyed your “rant”.
The control problem is initially introduced as: “the problem of how to control what the superintelligence would do.” In the chapter you reference it is presented as the principal agent problem that occurs between a human and the superintelligent AI they build (apparently the whole of that problem).
It would be reasonable to say that there is no control problem for modern AI because Bostrom’s usage of “the control problem” is exclusively about controlling superintelligence. On this definition either there is no control research today, or it comes back to the implicit controversial empirical claim about how some work is relevant and other work is not.
If you are teaching GPT to better understand instructions I would also call that improving its capability (though some people would call it alignment, this is the de dicto vs de re distinction discussed here). If it already understands instructions and you are training it to follow them, I would call that alignment.
I think you can use AI alignment however you want, but this is a lame thing to get angry at labs about and you should expect ongoing confusion.
Bostroms definition of the control problem in ‘Superintelligence’ only refer to “harming the projects interests”, which you are right is broader than existential risk. However, the immediate context makes it clear that Bostrom is discussing existential risk. The “harm” referred to does not include things like gender bias.
On reflection, I don’t actually believe that AI Alignment has ever exclusively referred to existential risk from AI. I do believe that talk about “AI Alignment” on LessWrong has usually primarily been about existential risk. I further think that the distinction from “Value Alignment” (and if that is related to existential risk) has been muddled and debated.
I think the term “The Alignment Problem” is used because this community agrees that one problem (not killing everyone) is far and away more central than the rest (e.g. designing an AI to refuse to tell you how to make drugs).
Apart from the people here from OpenAI/DeepMind/etc, I expect general agreement that the task “Getting GPT to better understand and follow instructions” is not AI Alignment, but AI Capability. Note that I am moving my goalpost from defending the claim “AI Alignment = X-Risk” to defending “Some of the things OpenAI call AI Alignment is not AI Alignment”.
At this point I should repeat my disclaimer that all of this is my impression, and not backed by anything rigorous. Thank you for engaging anyway—I enjoyed your “rant”.
The control problem is initially introduced as: “the problem of how to control what the superintelligence would do.” In the chapter you reference it is presented as the principal agent problem that occurs between a human and the superintelligent AI they build (apparently the whole of that problem).
It would be reasonable to say that there is no control problem for modern AI because Bostrom’s usage of “the control problem” is exclusively about controlling superintelligence. On this definition either there is no control research today, or it comes back to the implicit controversial empirical claim about how some work is relevant and other work is not.
If you are teaching GPT to better understand instructions I would also call that improving its capability (though some people would call it alignment, this is the de dicto vs de re distinction discussed here). If it already understands instructions and you are training it to follow them, I would call that alignment.
I think you can use AI alignment however you want, but this is a lame thing to get angry at labs about and you should expect ongoing confusion.