If a language model reads many proposals for AI alignment, is it, or will any future version, be capable of giving opinions on which proposals are good or bad?
Yes, of course. The question then is whether its opinions are any good. Check out iterated amplification.
If a language model reads many proposals for AI alignment, is it, or will any future version, be capable of giving opinions on which proposals are good or bad?
Yes, of course. The question then is whether its opinions are any good. Check out iterated amplification.