Agreed. To be consistently “helpful, honest, and harmless”, LLM should somehow “keep this on the back of its mind” when it assists the person, or else it risks violating these desiderata.
In DNN LLMs, “keeping something in the back of the mind” is equivalent to activating the corresponding feature (of “HHH assistant”, in this case) during most inferences, which is equivalent to self-awareness, self-evidencing, goad-directedness, and agency in a narrow sense (these are all synonyms). See my reply to nostalgebraist for more details.
Agreed. To be consistently “helpful, honest, and harmless”, LLM should somehow “keep this on the back of its mind” when it assists the person, or else it risks violating these desiderata.
In DNN LLMs, “keeping something in the back of the mind” is equivalent to activating the corresponding feature (of “HHH assistant”, in this case) during most inferences, which is equivalent to self-awareness, self-evidencing, goad-directedness, and agency in a narrow sense (these are all synonyms). See my reply to nostalgebraist for more details.