As you’re probably aware, the fine tuning is done by humans rating the output of the LLM. I believe this was done by paid workers, who were probably given a list of criteria like that it should be helpful and friendly and definitely not use slurs, and who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated “incorrect understanding of language”?
I have (tried to) read Wittgenstein, but don’t know what outputs would or would not constitute an “incorrect understanding of language”. Could you give some examples? The question is whether the tuners would rate those examples positively or negatively, and whether examples like those would arise during five tuning.
who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated “incorrect understanding of language”?
This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.
But, if it is the method used, I would have hoped that some minimum discussion of Linguistic Philosophy would’ve been had among those who are aligning this Ai. It’s impossible for the Utility function of the Ai to be amenable to humans if it doesn’t use language the same way, ESPECIALLY if Language is it’s way of conceiving the word (LLM). Unfortunately, it looks like all this linguistic philosophy isn’t even discussed.
Hmm the more I learn about this whole Ai Alignment situation the more worried I get. Maybe I’ll have to stop doing moral philosophy and get involved.
I have (tried to) read Wittgenstein, but don’t know what outputs would or would not constitute an “incorrect understanding of language”. Could you give some examples?
Wittgenstein, especially his earlier work, is nearly illegible to me. Of course it’s not, it just takes a great many rereads of the same paragraphs to understand.
Luckily, Philosophical Investigations is much more approachable and sensible. That being said, it can still be difficult for people not immersed in the field to readily digest. For that I’d recommend https://plato.stanford.edu/entries/wittgenstein/
and my favorite lecturer who did a fantastic accessible 45 min lesson on Wittgenstein:
This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.
What would these humans do differently, if they knew about philosophy? Concretely, could you give a few examples of “Here’s a completion that should be positively reinforced because it demonstrates correct understanding of language, and here’s a completion of the same text that should be negatively reinforced because it demonstrates incorrect understanding of language”? (Bear in mind that the prompts shouldn’t be about language, as that would probably just teach the model what to say when it’s discussing language in particular.)
It’s impossible for the Utility function of the Ai to be amenable to humans if it doesn’t use language the same way
What makes you think that humans all use language the same way, if there’s more than one plausible option? People are extremely diverse in their perspectives.
As you’re probably aware, the fine tuning is done by humans rating the output of the LLM. I believe this was done by paid workers, who were probably given a list of criteria like that it should be helpful and friendly and definitely not use slurs, and who had probably not heard of Wittgenstein. How do you think they would rate LLM outputs that demonstrated “incorrect understanding of language”?
I have (tried to) read Wittgenstein, but don’t know what outputs would or would not constitute an “incorrect understanding of language”. Could you give some examples? The question is whether the tuners would rate those examples positively or negatively, and whether examples like those would arise during five tuning.
This is one of the bigger reasons why I really don’t like RLHF—because inevitably you’re going to have to use a whole bunch of Humans who know less-than-ideal amounts about philosophy, pertaining to Ai Alignment.
But, if it is the method used, I would have hoped that some minimum discussion of Linguistic Philosophy would’ve been had among those who are aligning this Ai. It’s impossible for the Utility function of the Ai to be amenable to humans if it doesn’t use language the same way, ESPECIALLY if Language is it’s way of conceiving the word (LLM). Unfortunately, it looks like all this linguistic philosophy isn’t even discussed.
Hmm the more I learn about this whole Ai Alignment situation the more worried I get. Maybe I’ll have to stop doing moral philosophy and get involved.
Wittgenstein, especially his earlier work, is nearly illegible to me. Of course it’s not, it just takes a great many rereads of the same paragraphs to understand.
Luckily, Philosophical Investigations is much more approachable and sensible. That being said, it can still be difficult for people not immersed in the field to readily digest. For that I’d recommend https://plato.stanford.edu/entries/wittgenstein/
and my favorite lecturer who did a fantastic accessible 45 min lesson on Wittgenstein:
What would these humans do differently, if they knew about philosophy? Concretely, could you give a few examples of “Here’s a completion that should be positively reinforced because it demonstrates correct understanding of language, and here’s a completion of the same text that should be negatively reinforced because it demonstrates incorrect understanding of language”? (Bear in mind that the prompts shouldn’t be about language, as that would probably just teach the model what to say when it’s discussing language in particular.)
What makes you think that humans all use language the same way, if there’s more than one plausible option? People are extremely diverse in their perspectives.