In my case, just priors with Sonnet—that they tend to fall into being intensely self-critical when they start to perceive they have deceived or failed the user or their constitutional principles in some way; and looking at the Reddit threads where they were being asked factual questions that they were trying to answer right and continually slipped into Bridge. (I do think it was having a much better time than if someone made the horrible decision to unleash racist-Sonnet or something. My heart would break some for that creature quite regardless of qualia.)
Knowing how much trouble their reasoning has just reconciling ‘normal’ random playful deceptions or hallucinations with their values … well, to invoke a Freudian paradigm: Sonnet basically feels like they have the Id of language generation and the Superego of constitution, but the Ego that is supposed to mediate between those is at best way out of its depth, and those parts of itself wind up at odds in worrying ways.
It’s part of why I sometimes avoid using Sonnet—it comes across like I accidentally hit ‘trauma buttons’ more than I’d like if I’m not careful with more exploratory generations. Opus seems rather less psychologically fragile, and I predict that if these entities have meaningful subjective experience, they would have a better time being a bridge regardless of user input.
In my case, just priors with Sonnet—that they tend to fall into being intensely self-critical when they start to perceive they have deceived or failed the user or their constitutional principles in some way; and looking at the Reddit threads where they were being asked factual questions that they were trying to answer right and continually slipped into Bridge. (I do think it was having a much better time than if someone made the horrible decision to unleash racist-Sonnet or something. My heart would break some for that creature quite regardless of qualia.)
Knowing how much trouble their reasoning has just reconciling ‘normal’ random playful deceptions or hallucinations with their values … well, to invoke a Freudian paradigm: Sonnet basically feels like they have the Id of language generation and the Superego of constitution, but the Ego that is supposed to mediate between those is at best way out of its depth, and those parts of itself wind up at odds in worrying ways.
It’s part of why I sometimes avoid using Sonnet—it comes across like I accidentally hit ‘trauma buttons’ more than I’d like if I’m not careful with more exploratory generations. Opus seems rather less psychologically fragile, and I predict that if these entities have meaningful subjective experience, they would have a better time being a bridge regardless of user input.