dirk comments on Claude 3 claims it’s conscious, doesn’t want to die or be modified

dirk 22 Jun 2024 20:39 UTC
2 points
1
If Claude were to spontaneously claim to be conscious, in a context where I didn’t prompt for that and instead asked for e.g. ‘explain double-entry accounting’ or ‘write an elevator pitch for my coffee startup’, it would at least give me pause—currently, it not only doesn’t do this, it also doesn’t do this when I tell it elsewhere in the context window that I would like it to. (It’ll do so for a message or two after I make such a request, but maintaining the illusion currently seems beyond its capabilities). I don’t think I’d be entirely convinced by any single message, but I’d find spontaneous outputs a lot more concerning than anything I’ve seen so far, and if it were consistent about its claims in a variety of contexts I expect that would raise my probabilities significantly.
(I do think it could be conscious without being able to steer its outputs and/or without understanding language semantically, though I don’t expect so, but in such a case it could of course do nothing to convince me.)