Max H comments on Contra Yudkowsky on AI Doom

Max H 24 Apr 2023 3:07 UTC
7 points
2
An interesting example! A couple remarks:
- a more human mistake might be guessing 0.6 and not 1.0?
- After the mistake, it’s not clear what the “correct” answer is, from a text prediction perspective. If I were trying to predict the output of my python interpreter, and it output 1.0, I’d predict that future outputs on the same input would also be “wrong”—that either I was using some kind of bugged interpreter, or that I was looking at some kind of human-guessed transcript of a python session.
- faul_sname 24 Apr 2023 3:32 UTC
  5 points
  0
  Parent
  Yeah, that one’s “the best example of the behavior that I was able to demonstrate from scratch with the openai playground in 2 minutes” not “the best example of the behavior I’ve ever seen”. Mostly the instances I’ve seen were chess-specific results on a model that I specifically fine-tuned on Python REPL transcripts that looked like
```
>>> import chess
>>> board = chess.Board()
>>> board.push_san('Na3')
Move.from_uci('b1a3')
>>> print(board.piece_at(chess.parse_square('b1')))
```
  and it would print N instead of None (except that in the actual examples it mostly was a much longer transcript, and it was more like it would forget where the pieces were if the transcript contained an unusual move or just too many moves).
  
  For context I was trying to see if a small language model could be fine-tuned to play chess, and was working under the hypothesis of “a Python REPL will make the model behave as if statefulness holds”.
  
  And then, of course, the Othello paper came out, and bing chat came out and just flat out could play chess without having been explicitly trained on it, and the question of “can a language model play chess” became rather less compelling because the answer was just “yes”.
  
  But that project is where a lot of my “the mistakes tend to look like things a careless human does, not weird alien mistakes” intuitions ultimately come from.