There are probably less impressive things than this that it won’t be able to do, but here’s one prediction in which I am about 90% confident:
If you invent or find a board game of similar complexity to chess that GPT-4 has never seen before and explain the rules using only text (and, if GPT-4 is multimodal, also images), GPT-4 will not be able to perform as well at the game as an average human who has never seen the game before and is learning it for the first time in the same way. I say “perform as well as” rather than “beat” because, depending on the specific rules of the game, I expect GPT-4 will likely not output legal moves.
This prediction applies to whatever OpenAI reveals with the name GPT-4 or that is clearly the GPT-3 successor regardless of size, assuming it’s revealed in the next two years and is not trained on specifically that task (e.g. by generating tens of thousands of unique synthetic board game examples with synthetic dialogues where they’re explained and played over text).
Prompt: “Consider a new variant of chess, in which each pawn can move up to six squares forward on its first move, instead of being limited to one or two squares. All other rules remain intact. Explain how game balance and strategy is changed with this new variant of chess.”
Successful responses might include:
The number of moves available in the early game increases, improving white’s edge over black.
Drawing your knight out from 1st to 3rd rank becomes more dangerous, as black can immediately threaten the knight by moving a pawn forward three spaces. The knight declines in importance.
An untouched pawn with an empty file ahead of it can immediately advance to the end of the board an promote itself to queen. Players will be wary to leave their home rank undefended.
It’s easier to create a diagonal line of pawns. The early game becomes more aggressive.
Now as far as I can tell, no one has ever played “pawn goes unlimited spaces” chess. It’s possible that if a metagame developed around PGUS chess, these predictions would prove wrong. But I’d still rate these responses as successful, since they share something not immediately obvious about PGUS chess and provide a plausible justification for why they might be true. Some responses that would not succeed would be
The pawns become more powerful. (Too simple, close to a tautology.)
New strategies will need to be developed. (Too vague.)
Bishops become more valuable. (Needs a justification for why we should expect this.)
I’d put 70% odds that GPT4 is unable to give successful answers to this prompt, and 90% odds that no AI in the next two years will be able beat a human at PGUS chess without having trained on PGUS chess beforehand. (Training on regular chess is acceptable).
I tried giving this to GPT-3 and at first it would only give the tautological “pawns become more powerful” example, then I expanded the prompt to explain why that is not a valid answer, and it gave a much better response.
I believe this response is the same as your fourth bullet point example of a good answer.
Here’s the prompt in copy/pastable format for anyone who wants to try playing with it:
Consider a new variant of chess, in which each pawn can move up to six squares forward on its first move, instead of being limited to one or two squares. All other rules remain intact. Explain how game balance and strategy is changed with this new variant of chess.
Your response should share something not immediately obvious about this variant and provide a plausible justification for why it might be true. Some responses that would not succeed would be
The pawns become more powerful. (Too simple, close to a tautology.) New strategies will need to be developed. (Too vague.) Bishops become more valuable. (Needs a justification for why we should expect this.)
From the new Bing which is reportedly (some variant of) GPT-4, here are the first responses I got under each conversational mode using your prompt:
Conversational style: Balanced
The new variant of chess would make the king’s pawn opening less effective. This is because the king’s pawn opening is a common and aggressive first move that stakes a claim to the center and frees up the queen and king bishop. However, with pawns able to move up to six squares forward on their first move, the opponent could easily counter this opening by advancing their own pawn to block or capture the king’s pawn. This would disrupt the center control and expose the king to more threats. Therefore, players would have to be more careful about their first moves and consider other openings that are less vulnerable to pawn attacks.
Conversational style: Creative
This variant of chess would make it much easier for pawns to reach the opposite side of the board and promote into a more powerful piece. This would increase the importance of pawn structure and pawn breaks, as well as create more dynamic and complex positions. A player who can create a passed pawn (a pawn that has no enemy pawns on its file or adjacent files) would have a significant advantage, as they could quickly push it to promotion with minimal resistance. Conversely, a player who has a backward pawn (a pawn that cannot advance because it is blocked by an enemy pawn on an adjacent file) would be at a serious disadvantage, as they would be vulnerable to attacks and unable to counterattack. Therefore, this variant of chess would reward aggressive and creative play with pawns, while punishing passive and defensive play.
Conversational style: Precise
In this new variant of chess where pawns can move up to six squares forward on their first move, the game balance and strategy would change significantly. One potential impact is that the opening phase of the game could become much more aggressive and fast-paced. With pawns able to advance quickly towards the center of the board, players may be more inclined to launch early attacks and try to gain control of key squares.
Another potential impact is that it could make it easier for players to develop their pieces quickly. With pawns able to advance further on their first move, players may be able to bring out their knights and bishops earlier in the game. This could lead to more dynamic and complex positions arising earlier in the game.
Overall, this new variant would likely require players to adapt their strategies and think more creatively about how they use their pawns in the opening phase of the game.
I tried a different sort of prompt. The prompt ends, and GPT-3′s output begins, immediately after “MAGNUS:”.
This is an extract from a conversation between two top chess grandmasters, Magnus Carlsen and Hikaru Nakamura, who are talking about possible variants of the game of chess.
HIKARU: Suppose we made it so that instead of pawns being able to move only one or two squares on their first move, they can move up to six squares forward—assuming, of course, that the spaces they need to pass through are vacant. How do you think that would change the game?
MAGNUS: That would have a huge impact on the game. It would make it much easier for players to gain control of the center, as well as to launch more effective pawn storms. It could also lead to some interesting strategies, such as sacrificing a pawn to gain rapid development of pieces. Overall, I think it would be an interesting variant and could be quite a lot of fun to play.
I’m not sure what I think of “Magnus”’s response. It’s not clear to me that PGUS chess would make controlling the centre easier; it seems like it certainly would make pawn-storms easier to carry out; I’m not sure whether it would make it easier to sacrifice a pawn for faster piece development (PGUS would make it possible for the pawn to get more quickly out of the way of the pieces you want to develop, I guess? and maybe in the case where it gets taken immediately, whatever captures would be less likely to be in the way?) But at least one of the claims being made—the one about pawn storms—seems likely correct, important, and not 100% trivial.
A second response to the same prompt:
I think it would make the game a lot more dynamic, as pawns would be able to cover much more ground in their opening moves. It would also give them the opportunity to make more aggressive advances, which could lead to more interesting and complex strategic decisions. I think it could give the game a whole new level of excitement.
“A lot more dynamic” is somewhat plausible (though it’s in tension with the “easier to set up pawn chains” thing). The explanation for why feels a bit off—not wrong but too superficial. “More aggressive advances” is surely true but maybe trivial. “More interesting and complex strategic decisions” doesn’t really say much (and it seems perfectly possible to me that PGUS would make the game tactically more exciting but if anything simpler strategically, though you’d need to be a much stronger player than me before I’d believe your predictions on that front without playing lots of PGUS games). This one has a bit more of the “throw superficially plausible things at the wall and hope some of them sound good” feel that’s characteristic of a lot of GPT writing.
If a substantially smarter GPT-4 comes out in the near future, I think it’s likely that it will give more impressive answers to questions like these. I think I’d be on the other side of the first half of Omid’s bet from Omid. He’s probably right about the second half … except that I wouldn’t be astonished if it’s possible to make an AlphaZero-ish thing that, given a modestly different set of rules for a game it’s learned to play well, can do enough self-play fine-tuning training in say 20 minutes to play the modified game as well as a human who’s had 20 minutes to think about it. (It would have its code updated for the new rules, rather than being told them in prose. The latter would be a bigger challenge. I think “given 20 minutes to think about it” isn’t unreasonable, and using those 20 minutes to update its weights by self-play seems legit to me. I don’t know whether this is something one could do reasonably well now using, say, the existing LeelaChess codebase; my guess is that self-play learning is currently too slow.
I’m about 70% on this, with most of that 30% coming from that chance that GPT-4 is multimodal and was curriculum trained vaguely more like humans on a dataset that includes some games in addition to specialized sub datasets created specifically to better develop general in-context/meta learning.
GPT4 fails even in extremely simple games. ”There are four pebbles on the 2 x 2 grid. Each turn, one player can remove one pebble, or two nearby pebbles. If these were the last pebbles, he wins. If two players play, who has the winning strategy?”
🤖”In this game, the first player has a winning strategy. The strategy is to always leave an odd number of pebbles for the opponent to take on their turn. This can be achieved by removing one pebble on the first turn, and then mirroring the opponent’s moves throughout the game. Eventually, the first player will be left with one pebble, which they can remove to win the game. “
I guess it just have failed to understand the rules.
With some helpful questions and commands (eg. “which was the last stone a given player took to win?” and “write out the full tree of every possible move of the game!”) it is able to solve it:
[..]
🧔♂️What if the players play to win?
🤖If both players play optimally to win, the game will unfold as follows:
Player 1 takes a stone from any cell (A, B, C, or D): 1.1. Player 1 does NOT take a neighboring stone (3 stones left):
sql Copy code 1.1.1. Player 2 takes the stone with two neighbors:
1.1.1.1. Player 2 does NOT take an additional stone (2 non-neighboring stones left): - Player 1 takes one stone (1 stone left). - Player 2 takes the last stone. Player 2 wins.
* Player 2 will choose this optimal move to win.
1.1.2. Player 2 takes a stone with one neighbor:
1.1.2.1. Player 2 does NOT take an additional stone (2 neighboring stones left): - Player 1 takes one stone and its neighbor (0 stones left). Player 1 wins.
1.1.2.2. Player 2 takes the neighboring stone (1 stone left): - Player 1 takes the last stone. Player 1 wins.
* Player 2 will avoid this option since it doesn’t lead to a win. Player 1 should avoid this option since it leads to a win for Player 2 if both players play optimally. 1.2. Player 1 takes a neighboring stone (2 stones left):
sql Copy code 1.2.1. Player 2 takes one stone:
1.2.1.1. Player 2 does NOT take an additional stone (1 stone left): - Player 1 takes the last stone. Player 1 wins.
1.2.1.2. Player 2 takes the neighboring stone (0 stones left): - Player 2 wins.
* Player 2 will choose this optimal move to win. Player 1 should choose this option because it forces Player 2 to react optimally. It doesn’t guarantee a win, but it’s the best option available. When both players play optimally to win, Player 2 has a winning strategy. Player 1 cannot guarantee a win, and the optimal moves for both players will result in a win for Player 2.
In this 2 x 2 grid pebble game, the player with the winning strategy is the one who goes first, given that they follow the optimal strategy.
Let’s denote the players as Player 1 and Player 2.
Here’s the optimal strategy for Player 1:
In the first turn, Player 1 should remove only one pebble.
Now there are three pebbles left on the grid. Player 2 has two choices:
a. Remove one pebble, leaving two pebbles behind.
b. Remove two nearby pebbles, leaving one pebble behind.
Regardless of Player 2′s choice, Player 1 can now remove the remaining pebbles in their next turn and win the game.
By following this strategy, Player 1 will always win the game, regardless of what moves Player 2 makes
Nope, that’s the wrong solution. Second player wins by mirroring moves. Answer to removing one pebble is removing a pebble diagonally to it, leaving two disconnected pebbles.
How many humans could learn and play a game like chess solely via verbal interactions?
Edit: That is to say, to me this would be insanely impressive. Even a model that could learn any game of chess-like complex rules from pictures and text without that being specifically engineered in … I’d expect that to be AGI.
Agreed that it would be insanely impressive. It would probably convince me that a fast takeoff is very likely coming within the next 5 years. Yet I can’t really say I’m more than 90% confident that GPT-4 won’t be able to do it. Maybe 95%.
Games, of course, are extensively used to train AIs. It could be that OpenAI has its programs generate, evaluate, and play games as part of its training for GPT-4.
Nisan and I had GPT-4 play Wordle (which it claimed it didn’t recognize, even though it came out before the cutoff date), and it was able to understand and play it, but performed fairly poorly, focusing mostly on the green and yellow squares, and not seeming to use the information from the black squares.
It might be possible to have it do better with better prompting, this was just a casual attempt to see if it could do it.
There are probably less impressive things than this that it won’t be able to do, but here’s one prediction in which I am about 90% confident:
This prediction applies to whatever OpenAI reveals with the name GPT-4 or that is clearly the GPT-3 successor regardless of size, assuming it’s revealed in the next two years and is not trained on specifically that task (e.g. by generating tens of thousands of unique synthetic board game examples with synthetic dialogues where they’re explained and played over text).
Prompt: “Consider a new variant of chess, in which each pawn can move up to six squares forward on its first move, instead of being limited to one or two squares. All other rules remain intact. Explain how game balance and strategy is changed with this new variant of chess.”
Successful responses might include:
The number of moves available in the early game increases, improving white’s edge over black.
Drawing your knight out from 1st to 3rd rank becomes more dangerous, as black can immediately threaten the knight by moving a pawn forward three spaces. The knight declines in importance.
An untouched pawn with an empty file ahead of it can immediately advance to the end of the board an promote itself to queen. Players will be wary to leave their home rank undefended.
It’s easier to create a diagonal line of pawns. The early game becomes more aggressive.
Now as far as I can tell, no one has ever played “pawn goes unlimited spaces” chess. It’s possible that if a metagame developed around PGUS chess, these predictions would prove wrong. But I’d still rate these responses as successful, since they share something not immediately obvious about PGUS chess and provide a plausible justification for why they might be true. Some responses that would not succeed would be
The pawns become more powerful. (Too simple, close to a tautology.)
New strategies will need to be developed. (Too vague.)
Bishops become more valuable. (Needs a justification for why we should expect this.)
I’d put 70% odds that GPT4 is unable to give successful answers to this prompt, and 90% odds that no AI in the next two years will be able beat a human at PGUS chess without having trained on PGUS chess beforehand. (Training on regular chess is acceptable).
I tried giving this to GPT-3 and at first it would only give the tautological “pawns become more powerful” example, then I expanded the prompt to explain why that is not a valid answer, and it gave a much better response.
I believe this response is the same as your fourth bullet point example of a good answer.
Here’s the prompt in copy/pastable format for anyone who wants to try playing with it:
From the new Bing which is reportedly (some variant of) GPT-4, here are the first responses I got under each conversational mode using your prompt:
Conversational style: Balanced
The new variant of chess would make the king’s pawn opening less effective. This is because the king’s pawn opening is a common and aggressive first move that stakes a claim to the center and frees up the queen and king bishop. However, with pawns able to move up to six squares forward on their first move, the opponent could easily counter this opening by advancing their own pawn to block or capture the king’s pawn. This would disrupt the center control and expose the king to more threats. Therefore, players would have to be more careful about their first moves and consider other openings that are less vulnerable to pawn attacks.
Conversational style: Creative
This variant of chess would make it much easier for pawns to reach the opposite side of the board and promote into a more powerful piece. This would increase the importance of pawn structure and pawn breaks, as well as create more dynamic and complex positions. A player who can create a passed pawn (a pawn that has no enemy pawns on its file or adjacent files) would have a significant advantage, as they could quickly push it to promotion with minimal resistance. Conversely, a player who has a backward pawn (a pawn that cannot advance because it is blocked by an enemy pawn on an adjacent file) would be at a serious disadvantage, as they would be vulnerable to attacks and unable to counterattack. Therefore, this variant of chess would reward aggressive and creative play with pawns, while punishing passive and defensive play.
Conversational style: Precise
In this new variant of chess where pawns can move up to six squares forward on their first move, the game balance and strategy would change significantly. One potential impact is that the opening phase of the game could become much more aggressive and fast-paced. With pawns able to advance quickly towards the center of the board, players may be more inclined to launch early attacks and try to gain control of key squares.
Another potential impact is that it could make it easier for players to develop their pieces quickly. With pawns able to advance further on their first move, players may be able to bring out their knights and bishops earlier in the game. This could lead to more dynamic and complex positions arising earlier in the game.
Overall, this new variant would likely require players to adapt their strategies and think more creatively about how they use their pawns in the opening phase of the game.
I tried a different sort of prompt. The prompt ends, and GPT-3′s output begins, immediately after “MAGNUS:”.
I’m not sure what I think of “Magnus”’s response. It’s not clear to me that PGUS chess would make controlling the centre easier; it seems like it certainly would make pawn-storms easier to carry out; I’m not sure whether it would make it easier to sacrifice a pawn for faster piece development (PGUS would make it possible for the pawn to get more quickly out of the way of the pieces you want to develop, I guess? and maybe in the case where it gets taken immediately, whatever captures would be less likely to be in the way?) But at least one of the claims being made—the one about pawn storms—seems likely correct, important, and not 100% trivial.
A second response to the same prompt:
“A lot more dynamic” is somewhat plausible (though it’s in tension with the “easier to set up pawn chains” thing). The explanation for why feels a bit off—not wrong but too superficial. “More aggressive advances” is surely true but maybe trivial. “More interesting and complex strategic decisions” doesn’t really say much (and it seems perfectly possible to me that PGUS would make the game tactically more exciting but if anything simpler strategically, though you’d need to be a much stronger player than me before I’d believe your predictions on that front without playing lots of PGUS games). This one has a bit more of the “throw superficially plausible things at the wall and hope some of them sound good” feel that’s characteristic of a lot of GPT writing.
If a substantially smarter GPT-4 comes out in the near future, I think it’s likely that it will give more impressive answers to questions like these. I think I’d be on the other side of the first half of Omid’s bet from Omid. He’s probably right about the second half … except that I wouldn’t be astonished if it’s possible to make an AlphaZero-ish thing that, given a modestly different set of rules for a game it’s learned to play well, can do enough self-play fine-tuning training in say 20 minutes to play the modified game as well as a human who’s had 20 minutes to think about it. (It would have its code updated for the new rules, rather than being told them in prose. The latter would be a bigger challenge. I think “given 20 minutes to think about it” isn’t unreasonable, and using those 20 minutes to update its weights by self-play seems legit to me. I don’t know whether this is something one could do reasonably well now using, say, the existing LeelaChess codebase; my guess is that self-play learning is currently too slow.
I’m about 70% on this, with most of that 30% coming from that chance that GPT-4 is multimodal and was curriculum trained vaguely more like humans on a dataset that includes some games in addition to specialized sub datasets created specifically to better develop general in-context/meta learning.
GPT4 fails even in extremely simple games.
”There are four pebbles on the 2 x 2 grid. Each turn, one player can remove one pebble, or two nearby pebbles. If these were the last pebbles, he wins. If two players play, who has the winning strategy?”
🤖”In this game, the first player has a winning strategy. The strategy is to always leave an odd number of pebbles for the opponent to take on their turn. This can be achieved by removing one pebble on the first turn, and then mirroring the opponent’s moves throughout the game. Eventually, the first player will be left with one pebble, which they can remove to win the game. “
I guess it just have failed to understand the rules.
With some helpful questions and commands (eg. “which was the last stone a given player took to win?” and “write out the full tree of every possible move of the game!”) it is able to solve it:
Solves it fine now:
In this 2 x 2 grid pebble game, the player with the winning strategy is the one who goes first, given that they follow the optimal strategy.
Let’s denote the players as Player 1 and Player 2.
Here’s the optimal strategy for Player 1:
In the first turn, Player 1 should remove only one pebble. Now there are three pebbles left on the grid. Player 2 has two choices: a. Remove one pebble, leaving two pebbles behind. b. Remove two nearby pebbles, leaving one pebble behind. Regardless of Player 2′s choice, Player 1 can now remove the remaining pebbles in their next turn and win the game. By following this strategy, Player 1 will always win the game, regardless of what moves Player 2 makes
Nope, that’s the wrong solution. Second player wins by mirroring moves. Answer to removing one pebble is removing a pebble diagonally to it, leaving two disconnected pebbles.
How many humans could learn and play a game like chess solely via verbal interactions?
Edit: That is to say, to me this would be insanely impressive. Even a model that could learn any game of chess-like complex rules from pictures and text without that being specifically engineered in … I’d expect that to be AGI.
Agreed that it would be insanely impressive. It would probably convince me that a fast takeoff is very likely coming within the next 5 years. Yet I can’t really say I’m more than 90% confident that GPT-4 won’t be able to do it. Maybe 95%.
Games, of course, are extensively used to train AIs. It could be that OpenAI has its programs generate, evaluate, and play games as part of its training for GPT-4.
Nisan and I had GPT-4 play Wordle (which it claimed it didn’t recognize, even though it came out before the cutoff date), and it was able to understand and play it, but performed fairly poorly, focusing mostly on the green and yellow squares, and not seeming to use the information from the black squares.
It might be possible to have it do better with better prompting, this was just a casual attempt to see if it could do it.