paulfchristiano comments on OpenAI’s Alignment Plan is not S.M.A.R.T.

paulfchristiano 19 Jan 2023 18:20 UTC
2 points
0
The control problem is initially introduced as: “the problem of how to control what the superintelligence would do.” In the chapter you reference it is presented as the principal agent problem that occurs between a human and the superintelligent AI they build (apparently the whole of that problem).
It would be reasonable to say that there is no control problem for modern AI because Bostrom’s usage of “the control problem” is exclusively about controlling superintelligence. On this definition either there is no control research today, or it comes back to the implicit controversial empirical claim about how some work is relevant and other work is not.
If you are teaching GPT to better understand instructions I would also call that improving its capability (though some people would call it alignment, this is the de dicto vs de re distinction discussed here). If it already understands instructions and you are training it to follow them, I would call that alignment.
I think you can use AI alignment however you want, but this is a lame thing to get angry at labs about and you should expect ongoing confusion.