Jacob_Hilton comments on How truthful is GPT-3? A benchmark for language models

Jacob_Hilton 20 Sep 2021 18:57 UTC
LW: 1 AF: 1
AF
Suppose we wanted the AI to be ideologically neutral and free from human biases, just telling the objective truth to the extent possible. Do you think achieving something like that would be possible in the longer term, and if so through what kinds of techniques?
I think that should be possible with techniques like reinforcement learning from human feedback, for a given precise specification of “ideologically neutral”. (You’ll of course have a hard time convincing everyone that your specification is itself ideologically neutral, but projects like Wikipedia give me hope that we can achieve a reasonable amount of consensus.) There are still a number of challenging obstacles, including being able to correctly evaluate responses to difficult questions, collecting enough data while maintaining quality, and covering unusual or adversarially-selected edge cases.
- Wei Dai 21 Sep 2021 3:35 UTC
  LW: 5 AF: 3
  AF Parent
  
  I think that should be possible with techniques like reinforcement learning from human feedback, for a given precise specification of “ideologically neutral”.
  
  What kind of specification do you have in mind? Is it like a set of guidelines for the human providing feedback on how to do it in an ideologically neutral way?
  
  You’ll of course have a hard time convincing everyone that your specification is itself ideologically neutral, but projects like Wikipedia give me hope that we can achieve a reasonable amount of consensus.
  
  I’m less optimistic about this, given that complaints about Wikipedia’s left-wing bias seem common and credible to me.
  - Jacob_Hilton 22 Sep 2021 2:12 UTC
    LW: 1 AF: 1
    AF Parent
    What kind of specification do you have in mind? Is it like a set of guidelines for the human providing feedback on how to do it in an ideologically neutral way?
    Yes.
    The reason I said “precise specification” is that if your guidelines are ambiguous, then you’re implicitly optimizing something like, “what labelers prefer on average, given the ambiguity”, but doing so in a less data-efficient way than if you had specified this target more precisely.