Wei Dai comments on How truthful is GPT-3? A benchmark for language models

Wei Dai 21 Sep 2021 3:35 UTC
LW: 5 AF: 3
AF

I think that should be possible with techniques like reinforcement learning from human feedback, for a given precise specification of “ideologically neutral”.

What kind of specification do you have in mind? Is it like a set of guidelines for the human providing feedback on how to do it in an ideologically neutral way?

You’ll of course have a hard time convincing everyone that your specification is itself ideologically neutral, but projects like Wikipedia give me hope that we can achieve a reasonable amount of consensus.

I’m less optimistic about this, given that complaints about Wikipedia’s left-wing bias seem common and credible to me.
- Jacob_Hilton 22 Sep 2021 2:12 UTC
  LW: 1 AF: 1
  AF Parent
  What kind of specification do you have in mind? Is it like a set of guidelines for the human providing feedback on how to do it in an ideologically neutral way?
  Yes.
  The reason I said “precise specification” is that if your guidelines are ambiguous, then you’re implicitly optimizing something like, “what labelers prefer on average, given the ambiguity”, but doing so in a less data-efficient way than if you had specified this target more precisely.