FeepingCreature comments on Alignment Implications of LLM Successes: a Debate in One Act

FeepingCreature 21 Oct 2023 18:54 UTC
7 points
4

Simplicia: Sure. For example, I certainly don’t believe that LLMs that convincingly talk about “happiness” are actually happy. I don’t know how consciousness works, but the training data only pins down external behavior.

I mean, I don’t think this is obviously true? In combination with the inductive biases thing nailing down the true function out of a potentially huge forest, it seems at least possible that the LLMs would end up with an “emotional state” parameter pretty low down in its predictive model. It’s completely unclear what this would do out of distribution, given that even humans often go insane when faced with global scales, but it seems at least possible that it would sustain.

(This is somewhat equivalent to the P-zombie question.)