habryka comments on Don’t you think RLHF solves outer alignment?

habryka 30 Dec 2022 0:53 UTC
2 points
2
I mean, by this definition all capability-advances are solving alignment problems. Making your model bigger allows you to solve the “pretend to be Barack Obama” alignment problem. Hooking your model up to robots allows you to solve the “fold my towels” alignment problem. Making your model require less electricity to run solves the “don’t cause climate change” alignment problem.

I agree that there is of course some sense in which one could model all of these as “alignment problems”, but this does seem to be stretching the definition of the word alignment in a way that feels like it reduces its usefulness a lot.