Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 12 Feb 2022 14:50 UTC
12 points
People from AI Safety camp pointed me to this paper: https://rome.baulab.info/
It shows how “knowing” and “saying” are two different things in language models.
This is relevant to transparency, deception, and also to rebutting claims that transformers are “just shallow pattern-matchers” etc.
I’m surprised people aren’t making a bigger deal out of this!