I hadn’t yet got around to reading the CAST series: now I have to! :-)
Some of the authors of the Pretraining Language Models with Human Preferences paper now work at Anthropic. I would also love for Anthropic to hire me to work on this stuff!
In some sense, the human input and oversight in AI-assisted alignment is the same thing as corrigibility.
I hadn’t yet got around to reading the CAST series: now I have to! :-)
Some of the authors of the Pretraining Language Models with Human Preferences paper now work at Anthropic. I would also love for Anthropic to hire me to work on this stuff!
In some sense, the human input and oversight in AI-assisted alignment is the same thing as corrigibility.