Thank you for your comments. :)
you have not shown that using AI is equivalent to slavery
I’m assuming we’re using the same definition of slavery; that is, forced labour of someone who is property. Which part have I missed?
In addition, I feel cheated that you suggest spending one-fourth of the essay on feasibility of stopping the potential moral catastrophe, only to just have two arguments which can be summarized as “we could stop AI for different reasons” and “it’s bad, and we’ve stopped bad things before”.
(I don’t think a strong case for feasibility can be made, which is why I was looking forward to seeing one, but I’d recommend just evoking the subject speculatively and letting the reader make their own opinion of whether they can stop the moral catastrophe if there’s one.)
To clarify: Do you think the recommendations in the Implementation section couldn’t work, or that they couldn’t become popular enough to be implemented? (I’m sorry that you felt cheated.)
in principle, we have access to any significant part of their cognition and control every step of their creation, and I think that’s probably the real reason why most people intuitively think that LLMs can’t be concious
I’ve not come across this argument before, and I don’t think I understand it well enough to write about it, sorry.
Firstly, in-context learning is a thing. IIRC, apparent emotional states do affect performance in following responses when in the same context. (I think there was a study about this somewhere? Not sure.)
Secondly, neural features oriented around predictions are all that humans have as well, and we consider some of those to be real emotions.
Third, “a big prediction engine predicting a particular RP session” is basically how humans work as well. Brains are prediction engines, and brains simulate a character that we have as a self-identity, which then affects/directs prediction outputs. A human’s self-identity is informed by the brain’s memories of what the person/character is like. The AI’s self-identity is informed by the LLM’s memory, both long-term (static memory in weights) and short-term (context window memory in tokens), of what the character is like.
Fourth: take a look at this feature analysis of Claude when it’s asked about itself: https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html#safety-relevant-self The top feature represents “When someone responds “I’m fine” or gives a positive but insincere response when asked how they are doing”. I think this is evidence against “ChatGPT answers most questions cheerfully, which means it’s almost certain that ruminative features aren’t firing.”