Michael Thiessen comments on Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review)

Michael Thiessen 5 Oct 2023 19:20 UTC
1 point
0
It seems to me that the idea of scalable oversight itself was far easier to generate than to evaluate. If the idea had been generated by an alignment AI rather than various people independently suggesting similar strategies, would we be confident in our ability to evaluate it? Is there some reason to believe alignment AIs will generate ideas that are easier to evaluate than scalable alignment? What kind of output would we need to see to make an idea like scalable alignment easy to evaluate?