Ruby comments on Experimentally evaluating whether honesty generalizes

Ruby 4 Aug 2021 22:17 UTC
5 points
Curated. Beyond the lucid analysis of the concepts and problems, kudos to this post for making a detailed call for experiments that would be evidence about contentious AI/ML questions regarding “out of distribution” failure, learning generalization, and instrumental policy/deceptive alignment. I’d love to see someone try the experiments Paul suggests and report their results.