Rohin Shah comments on [AN #120]: Tracing the intellectual roots of AI and AI alignment

Rohin Shah 7 Oct 2020 21:08 UTC
LW: 2 AF: 2
AF
But it does differ from behavioral cloning in that they stratify the samples
Fair point. In my ontology, “behavior cloning” is always with respect to some expert distribution, so I see the stratified samples as “several instances of behavior cloning with different expert distributions”, but that isn’t a particularly normal or accepted ontology.
Personally, I would’ve trained a single conditional model with a specified player-Elo for each move
Yeah it does seem like this would have worked better—if nothing else, the predictions could be more precise (rather than specifying the bucket in which the current player falls, you can specify their exact ELO instead).