Salutator comments on ARC’s first technical report: Eliciting Latent Knowledge

Salutator 18 Jan 2022 21:00 UTC
1 point
If the reporter estimates every node of the human’s Bayes net, then it can assign a node a probability distribution different from the one that would be calculated from the distributions simultaneously assigned to its parent nodes. I don’t know if there is a name for that, so for now i will pompously call it inferential inconsistency. Considering this as a boolean bright-line concept, the human simulator is clearly the only inferentially consistent reporter. But one could consider some kind of metric on how different probability distributions are and turn it into a more gradual thing.
Being a reporter basically means being inferentially consistent on the training set. On the other hand being inferentially consistent everywhere means being the human simulator. So a direct translator would differ from a human simulator by being inferentially inconsistent for some inputs outside of the training set. This could in principle be checked by sampling random possible inputs. The human could then try to distinguish a direct translator from a randomly overfitted model by trying to understand a small sample of inferentially inconsistencies.
So much for my thoughts inside the paradigm, now on to snottily rejecting it. The intuition that the direct translator should exist seems implausible. And the idea that it would be so strong an attractor that a training strategy avoiding the human simulator would quasi-automatically borders on the absurd. Modeling a constraint on the training set and not outside of it basically is what overfitting is and overfitted solutions with many specialised degrees of freedom are usually highly degenerete. In other words, penalizing the human simulator would almost certainly lead to something closer to a pseudorandomizer than a direct translation. And looking at it a different way, the direct translator is supposed to be helpful in situations the human would perceive as contradictory. Or to put it differently, not bad model fits but rather models strongly misspecified and then extrapolated far out of the sample space. That’s basically situations where statistical inference and machine learning have strong track records of not working.