I think if AIs talk to each other using human language, they’ll start encoding stuff into it that isn’t apparent to a human reader, and this problem will get worse with more training.
I agree that using the forms of human language does not ensure interpretability by humans, and I also see strong advantages to communication modalities that would discard words in favor of more expressive embeddings. It is reasonable to expect that systems with strong learning capacity could to interprete and explain messages between other systems, whether those messages are encoded in words or in vectors. However, although this kind of interpretability seems worth pursuing, it seems unwise to rely on it.
The open-agency perspective suggests that while interpretability is important for proposals, it is less important in understanding the processes that develop those proposals. There is a strong case for accepting potentially uninterpretable communications among models involved in generating proposals and testing them against predictive models — natural language is insufficient for design and analysis even among humans and their conventional software tools.
Plans of action, by contrast, call for concrete actions by agents, ensuring a basic form of interpretability. Evaluation processes can and should favor proposals that are accompanied by clear explanations that stand up under scrutiny.
This made me think of “lawyer-speak”, and other jargons.
More generally, this seems to be a function of learning speed and the number of interactions on the one hand, and the frequency with which you interact with other groups on the other. (In this case, the question would be how often do you need to be understandable to humans, or to systems that need to be understandable to humans, etc.)
I think if AIs talk to each other using human language, they’ll start encoding stuff into it that isn’t apparent to a human reader, and this problem will get worse with more training.
I agree that using the forms of human language does not ensure interpretability by humans, and I also see strong advantages to communication modalities that would discard words in favor of more expressive embeddings. It is reasonable to expect that systems with strong learning capacity could to interprete and explain messages between other systems, whether those messages are encoded in words or in vectors. However, although this kind of interpretability seems worth pursuing, it seems unwise to rely on it.
The open-agency perspective suggests that while interpretability is important for proposals, it is less important in understanding the processes that develop those proposals. There is a strong case for accepting potentially uninterpretable communications among models involved in generating proposals and testing them against predictive models — natural language is insufficient for design and analysis even among humans and their conventional software tools.
Plans of action, by contrast, call for concrete actions by agents, ensuring a basic form of interpretability. Evaluation processes can and should favor proposals that are accompanied by clear explanations that stand up under scrutiny.
This made me think of “lawyer-speak”, and other jargons.
More generally, this seems to be a function of learning speed and the number of interactions on the one hand, and the frequency with which you interact with other groups on the other. (In this case, the question would be how often do you need to be understandable to humans, or to systems that need to be understandable to humans, etc.)