BestJohn comments on The Waluigi Effect (mega-post)

BestJohn 6 Mar 2023 14:02 UTC
1 point
0
What does trust mean, from the perspective of the LLM algorithm, in terms of a flattery-component? Do LLMs have a ‘trustometer?’ or can they evaluate some sort of stored world-state, compare the prompt, and come up with a “veracity” value that they use when responding the prompt?