Rohin Shah comments on New paper: (When) is Truth-telling Favored in AI debate?

Rohin Shah 26 Jan 2020 21:04 UTC
LW: 3 AF: 3
AF
I broadly agree with all of this, thanks :)
By “claims about a space of questions” you mean “a claim about every question from a space of questions”?
I just wrote incorrectly; I meant “the agent can choose a question from a space of questions and make a claim about it”. If you want to support claims about a space of questions, you could allow quantifiers in your questions.
However, you can also get the same behaviour in a typical world if you assume that the judge has a wrong prior.
I mean, sure, but any alignment scheme is going to have to assume some amount of correctness in the human-generated information it is given. You can’t learn about preferences if you model humans as arbitrarily wrong about their preferences.