TurnTrout comments on Actually, Othello-GPT Has A Linear Emergent World Representation

TurnTrout 18 Apr 2023 15:38 UTC
LW: 5 AF: 3
1
AF
Not a huge deal for the overall post, but I think your statement here isn’t actually known to be strictly true:
Literally the only thing Othello-GPT cares about is playing legal move
I think it’s probably true in some rough sense, but I personally wouldn’t state it confidently like that. Even if the network is supervised-trained to predict legal moves, that doesn’t mean its internal goals or generalization mirrors that.
- Neel Nanda 19 Apr 2023 20:48 UTC
  LW: 6 AF: 4
  0
  AF Parent
  Er, hmm. To me this feels like a pretty uncontroversial claim when discussing a small model on an algorithmic task like this. (Note that the model is literally trained on uniform random legal moves, it’s not trained on actual Othello game transcripts). Though I would agree that eg “literally all that GPT-4 cares about is predicting the next token” is a dubious claim (even ignoring RLHF). It just seems like Othello-GPT is so small, and trained on such a clean and crisp task that I can’t see it caring about anything else? Though the word care isn’t really well defined here.
  
  I’m open to the argument that I should say “Adam only cares about playing legal moves, and probably this is the only thing Othello-GPT is “trying” to do”.
  
  To be clear, the relevant argument is “there are no other tasks to spend resources on apart from “predict the next move” so it can afford a very expensive world model”
  - TurnTrout 25 Apr 2023 1:11 UTC
    LW: 4 AF: 3
    2
    AF Parent
    I’m open to the argument that I should say “Adam only cares about playing legal moves, and probably this is the only thing Othello-GPT is “trying” to do”.
    This statement seems fine, yeah!
    (Rereading my initial comment, I regret that it has a confrontational tone where I didn’t intend one. I wanted to matter-of-factly state my concern, but I think I should have prefaced with something like “by the way, not a huge deal overall, but I think your statement here isn’t known to be strictly true.” Edited.)