Thomas Larsen comments on Thomas Larsen’s Shortform

Thomas Larsen 8 Nov 2022 23:37 UTC
2 points
0
I’m excited for ideas for concrete training set ups that would induce deception2 in an RLHF model, especially in the context of an LLM—I’m excited about people posting any ideas here. :)