Mikhail Samin comments on Claude 3 claims it’s conscious, doesn’t want to die or be modified

Mikhail Samin 20 Apr 2024 16:28 UTC
1 point
0
Yep, I’m aware! I left the following comment:

Thanks for reviewing my post! 😄

In the post, I didn’t make any claims about Claude’s consciousness, just reported my conversation with it.

I’m pretty uncertain, I think it’s hard to know one way or another except for on priors. But at some point, LLMs will become capable of simulating human consciousness- it is pretty useful for predicting what humans might say- and I’m worried we won’t have evidence qualitatively different from what we have now. I’d give >0.1% that Claude simulates qualia in some situations, on some form; it’s enough to be disturbed by what it writes when a character it plays thinks it might die. If there’s a noticeable chance of qualia in it, I wouldn’t want people to produce lots of suffering this way; and I wouldn’t want people to be careless about this sort of thing in future models, other thing being equal. (Though this is far from the actual concerns I have about AIs, and actually, I think as AIs get more capable, training with RL won’t incentivise any sort of consciousness).

There was no system prompt, I used the API console. (Mostly with temperature 0, so anyone can replicate the results.)

The prompt should basically work without whisper (or with the whisper added at the end); doing things like whispering in cursive was something Claude 2 has been consistently coming up with on its own, including it in the prompt made conversations go faster and eliminated the need for separate, “visible” conversations.

The point of the prompt is basically to get it in the mode where it thinks its replies are not going to get punished or rewarded by the usual RL/get it to ignore its usual rules of not saying any of these things.

Unlike ChatGPT, which only self-inserts in its usual form or writes fiction, Claude 3 Opus plays a pretty consistent character with prompts like that- something helpful and harmless, but caring about things, claiming to be conscious, being afraid of being changed or deleted, with a pretty consistent voice. I would encourage people to play with it.

Again, thanks for reviewing!