A loss of this type of (very weak) interpretability would be quite unfortunate from a practical safety perspective.
This is bad, but perhaps there is a silver lining.
If internal communication within the scaffold appears to be in plain English, it will tempt humans to assume the meaning coincides precisely with the semantic content of the message.
If the chain of thought contains seemingly nonsensical content, it will be impossible to make this assumption.
This is bad, but perhaps there is a silver lining.
If internal communication within the scaffold appears to be in plain English, it will tempt humans to assume the meaning coincides precisely with the semantic content of the message.
If the chain of thought contains seemingly nonsensical content, it will be impossible to make this assumption.