Ajeya Cotra comments on ARC’s first technical report: Eliciting Latent Knowledge

Ajeya Cotra 3 Jan 2022 16:00 UTC
LW: 8 AF: 6
AF
Thanks Ruby! I’m really glad you found the report accessible.

One clarification: Bayes nets aren’t important to ARC’s conception of the problem of ELK or its solution, so I don’t think it makes sense to contrast ARC’s approach against an approach focused on language models or describe it as seeking a solution via Bayes nets.

The form of a solution to ELK will still involve training a machine learning model (which will certainly understand language and could just be a language model) using some loss function. The idea that this model could learn to represent its understanding of the world in the form of inference on some Bayes net is one of a few simple test cases that ARC uses to check whether the loss functions they’re designing will always incentivize honestly answering straightforward questions.

For example, another simple test case (not included in the report) is that the model could learn to represent its understanding of the world in a bunch of “sentences” that it performs logical operations on to transform into other sentences.

These test cases are settings for counterexamples, but not crucial to proposed solutions. The idea is that if your loss function will always learn a model that answers straightforward questions honestly, it should work in particular for these simplified cases that are easy to think about.
- Ruby 5 Jan 2022 4:03 UTC
  LW: 2 AF: 1
  AF Parent
  Thanks for the clarification, Ajeya! Sorry to make you have to explain that, it was a mistake to imply that ARC’s conception is specifically anchored on Bayes nets–the report was quite clear that isn’t.