Quintin Pope answers Is InstructGPT Following Instructions in Other Languages Surprising?

Quintin Pope 14 Feb 2023 1:01 UTC
11 points
7
I think it would be due to the LM in question using lots of language-neutral circuitry? See this paper.
RLHF mostly updates abstract/conceptual circuits, which (I assume) tend to be language neutral, then the language specific circuits just continue translating to/from the updated circuits.
- neverix 14 Feb 2023 18:46 UTC
  2 points
  0
  Parent
  Is RLHF updating abstract circuits an established fact? Why would it suffer from mode collapse in that case?