Alex Mallen comments on Dreams of AI alignment: The danger of suggestive names

Alex Mallen 10 Feb 2024 19:17 UTC
3 points
0
I’ll add another one to the list: “Human-level knowledge/human simulator”
Max Nadeau helped clarify some ways in which this framing introduced biases into my and others’ models of ELK and scalable oversight. Knowledge is hard to define and our labels/supervision might be tamperable in ways that are not intuitively related to human difficulty.
Different measurements of human difficulty only correlate at about 0.05 to 0.3, suggesting that human difficulty might not be a very meaningful concept for AI oversight, or that our current datasets for experimenting with scalable oversight don’t contain large enough gaps in difficulty to make meaningful measurements.