chasmani answers ELI5 Why isn’t alignment easier as models get stronger?

chasmani Oct 29, 2023, 5:42 PM
1 point
0
Well I agree it is a strawman argument. Following the same lines as your argument, I would say the counter argument is that we don’t really care if a weak model is fully aligned or not. Is my calculator aligned? Is a random number generator aligned? Is my robotic vacuum cleaner aligned? It’s not really a sensical question.

Alignment is a bigger problem with stronger models. The required degree of alignment is much higher. So even if we accept your strawman argument it doesn’t matter.