Because ‘alignment’ is used in several different ways, I feel like these days one either needs to asterisk in a definition (e.g. “By ‘alignment,’ I mean the AI faithfully carrying out instructions without killing everyone.”), or just use a more specific phrase.
I agree that instruction-following is not all you need. Many of these problems are solved by better value-learning.
Because ‘alignment’ is used in several different ways, I feel like these days one either needs to asterisk in a definition (e.g. “By ‘alignment,’ I mean the AI faithfully carrying out instructions without killing everyone.”), or just use a more specific phrase.
I agree that instruction-following is not all you need. Many of these problems are solved by better value-learning.