eggsyntax comments on Language Models Model Us

eggsyntax 12 Nov 2024 15:04 UTC
3 points
0
Update: a recent new paper, ‘Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback’, described by the authors on LW in ‘Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback’, finds that post-RLHF, LLMs may identify users who are more susceptible to manipulation and behave differently with those users. This seems like a clear example of LLMs modeling users and also making use of that information.