quetzal_rainbow comments on DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking

quetzal_rainbow 11 Jun 2024 9:06 UTC
2 points
0
My point is that RLHF incentivizes all sorts of tnings and these things depend on content of trained model, not on what RLHF is.
- tailcalled 11 Jun 2024 9:17 UTC
  4 points
  2
  Parent
  It depends on both.