Hoagy comments on [RFC] Possible ways to expand on “Discovering Latent Knowledge in Language Models Without Supervision”.

Hoagy 26 Jan 2023 0:14 UTC
3 points
2
An LLM will presumably have some internal representation of the characteristics of the voice that it is speaking with, beyond its truth value. Perhaps you could test for such a representation in an unsupervised manner by asking it to complete a sentence with and without prompting for a articular disposition (‘angry’, ‘understanding’,...). Once you learn to understand the effects that the prompting has, you could test how well this allows you to modify disposition via changing the activations.

This line of thought came from imagining what the combination of CCS and Constitutional AI might look like.