as applied to current foundation models it appears to do so
I don’t think the outputs of RLHF’d LLMs have the same mapping to the internal cognition which generated them that human behavior does to the human cognition which generated it. (That is to say, I do not think LLMs behave in ways that look kind because they have a preference to be kind, since right now I don’t think they meaningfully have preferences in that sense at all.)
I don’t think the outputs of RLHF’d LLMs have the same mapping to the internal cognition which generated them that human behavior does to the human cognition which generated it. (That is to say, I do not think LLMs behave in ways that look kind because they have a preference to be kind, since right now I don’t think they meaningfully have preferences in that sense at all.)