I am looking for a discussion of evidence that the LLMs internal “true” motivation or reasoning system is very different from human, despite the human output, and that in outlying environmental conditions, very different from the training environment, it will behave very differently.
Definition given in post:
I think my example counts.