Raemon comments on Which possible AI systems are relatively safe?

Raemon 22 Aug 2023 16:59 UTC
2 points
0
If its true thoughts are transparent and expressed in natural language(see e.g. Measuring Faithfulness in Chain-of-Thought Reasoning)
This seems technically true but a bit of a trap, since it may be easier to get ‘looks like it expresses its thoughts in natural language’ than ‘reliably actually does’ and specifying the difference may be too subtle for people.