I think that should be possible with techniques like reinforcement learning from human feedback, for a given precise specification of “ideologically neutral”.
What kind of specification do you have in mind? Is it like a set of guidelines for the human providing feedback on how to do it in an ideologically neutral way?
You’ll of course have a hard time convincing everyone that your specification is itself ideologically neutral, but projects like Wikipedia give me hope that we can achieve a reasonable amount of consensus.
I’m less optimistic about this, given that complaints about Wikipedia’s left-wing bias seem common and credible to me.
What kind of specification do you have in mind? Is it like a set of guidelines for the human providing feedback on how to do it in an ideologically neutral way?
Yes.
The reason I said “precise specification” is that if your guidelines are ambiguous, then you’re implicitly optimizing something like, “what labelers prefer on average, given the ambiguity”, but doing so in a less data-efficient way than if you had specified this target more precisely.
What kind of specification do you have in mind? Is it like a set of guidelines for the human providing feedback on how to do it in an ideologically neutral way?
I’m less optimistic about this, given that complaints about Wikipedia’s left-wing bias seem common and credible to me.
Yes.
The reason I said “precise specification” is that if your guidelines are ambiguous, then you’re implicitly optimizing something like, “what labelers prefer on average, given the ambiguity”, but doing so in a less data-efficient way than if you had specified this target more precisely.