What does trust mean, from the perspective of the LLM algorithm, in terms of a flattery-component? Do LLMs have a ‘trustometer?’ or can they evaluate some sort of stored world-state, compare the prompt, and come up with a “veracity” value that they use when responding the prompt?
What does trust mean, from the perspective of the LLM algorithm, in terms of a flattery-component? Do LLMs have a ‘trustometer?’ or can they evaluate some sort of stored world-state, compare the prompt, and come up with a “veracity” value that they use when responding the prompt?