One aspect of this proposal which I don’t know how to do is evaluation the answers of the question-answerer. That looks too me very related to the deconfusion of universality that we discussed a few months ago, and without an answer to this, I feel like I don’t even know how to run this silly approach.
You could imitate human answers, or you could ask a human “Is answer A′ much better than answer A?” Both of these only work for questions that humans can evaluate (in hindsight), and then the point of the scheme is to get an adequate generalization to (some) questions that humans can’t answer.
Ok, so you optimize the circuit both for speed and for small loss on human answers/comparisons, hoping that it generalizes to more questions while not being complex enough to be deceptive. Is that what you mean?
I’m mostly worried about parameter sharing between the human models in the environment and the QA procedure (which leads the QA to generalize like a human instead of correctly). You could call that deception but I think it’s a somewhat simpler phenomenon.
One aspect of this proposal which I don’t know how to do is evaluation the answers of the question-answerer. That looks too me very related to the deconfusion of universality that we discussed a few months ago, and without an answer to this, I feel like I don’t even know how to run this silly approach.
You could imitate human answers, or you could ask a human “Is answer A′ much better than answer A?” Both of these only work for questions that humans can evaluate (in hindsight), and then the point of the scheme is to get an adequate generalization to (some) questions that humans can’t answer.
Ok, so you optimize the circuit both for speed and for small loss on human answers/comparisons, hoping that it generalizes to more questions while not being complex enough to be deceptive. Is that what you mean?
I’m mostly worried about parameter sharing between the human models in the environment and the QA procedure (which leads the QA to generalize like a human instead of correctly). You could call that deception but I think it’s a somewhat simpler phenomenon.