johnswentworth comments on Which possible AI systems are relatively safe?

johnswentworth 21 Aug 2023 17:15 UTC
18 points
4
A note on this part:
If its true thoughts are transparent and expressed in natural language...
Claim: it is impossible for the internal thoughts of a mind to both be expressed mostly in natural language, and to use that natural language in a way at all similar to humans (for instance, no steganography). The reason is that, the way humans use natural language in practice, the words are largely used to “pull mental levers”, and then most of the load-bearing reasoning happens in the non-linguistic cognition those levers induce. So if language is used the way humans use it, then most of the cognition is hidden away in non-linguistic channels.
You can see this in practice every time someone has trouble expressing their thoughts in words. Or every time someone is able to express their thoughts in words to someone who already has a bunch of shared mental models (even without necessarily shared jargon to point to those models—e.g. one can say “you know that thing where...” and give a couple examples), but is unable to express their thoughts in words to someone who doesn’t already have the relevant mental models.
The closest unblocked requirement would be an AI with lots of parts separated by internal interfaces, where those interfaces use natural language in a human-like way. There’s still a lot of non-natural-language cognition going on in between the natural language in and the natural language out, but the interfaces might still provide a lot of useful visibility of intermediates. Basically-all of today’s LLMs would be examples of such an architecture: they take natural language in, do some magic, then spit out one more token of natural language.