Thanks for the link! I ended up looking through the data and there wasn’t any clear correlation between amount of time spent in research area and p(Doom).
I ran a few averages by both time spent in research area and region of undergraduate study here: https://docs.google.com/spreadsheets/d/1Kp0cWKJt7tmRtlXbPdpirQRwILO29xqAVcpmy30C9HQ/edit#gid=583622504
For the most part, groups don’t differ very much, although as might be expected, more North Americans have a high p(Doom) conditional on HLMI than other regions.
I don’t know how they did it, but I played a chess game against GPT4 by saying the following:
”I’m going to play a chess game. I’ll play white, and you play black. On each chat, I’ll post a move for white, and you follow with the best move for black. Does that make sense?”
And then going through the moves 1-by-1 in algebraic notation.
My experience largely follows that of GoteNoSente’s. I played one full game that lasted 41 moves and all of GPT4′s moves were reasonable. It did make one invalid move when I forgot to include the number before my move (e.g. Ne4 instead of 12. Ne4), but it fixed it when I put in the number in advance. Also, I think it was better in the opening than in the endgame. I suspect this is probably because of the large amount of similar openings in its training data.