alexandraabbas comments on Reducing sycophancy and improving honesty via activation steering

alexandraabbas 26 Apr 2024 10:49 UTC
1 point
0
“[...] This is because there would be no general direction towards a truth-based belief domain or away from using human modeling in output generation.”
What do you mean by “human modeling in output generation”?
- Nina Panickssery 26 Apr 2024 21:31 UTC
  2 points
  0
  Parent
  I am contrasting generating an output by:
  1. Modeling how a human would respond (“human modeling in output generation”)
  2. Modeling what the ground-truth answer is
  Eg. for common misconceptions, maybe most humans would hold a certain misconception (like that South America is west of Florida), but we want the LLM to realize that we want it to actually say how things are (given it likely does represent this fact somewhere)