Several unimpressive tasks, with my associated P(GPT-4 can’t do it):
4:1 - Write limericks that reliably rhyme and scan about arbitrary topics (topics about as complex as “an animal climbing a skyscraper”)
12:1 - Beat me at chess (which I’m quite bad at).
(“GPT-4 can beat me at chess” = “Somebody can find a non-cheaty program that maps a game-history to a prompt, and maps GPT-4′s output to a move, such that GPT-4 wrapped in that translation layer can beat me.”)
30:1 - Solve an ASCII-art maze (e.g. solve these by putting a sequence of @s from start to finish).
I’m happy to operationalize and bet on any of these, taking the “GPT-4 can’t do it” side.
Write a limerick about an animal climbing a skyscraper
Response:
Sure, I can try to write a limerick about that. Here is one I came up with:
There once was a monkey named Jack
Who decided to climb a big stack
He swung from each ledge
But he slipped off the edge
And he landed right flat on his back
I am thinking of mazes as complicated as the top one here! And few-shot is perfectly okay.
(I’d be flabbergasted if it could solve an ascii-art maze “in one step” (i.e. I present the maze in a prompt, and GPT-4 just generates a stream of tokens that shows the path through the maze). I’d accept a program that iteratively runs GPT-4 on several prompts until it considers the maze “solved,” as long as it was clear that the maze-solving logic lived in GPT-4 and not the wrapper program.)
Several unimpressive tasks, with my associated P(GPT-4 can’t do it):
4:1 - Write limericks that reliably rhyme and scan about arbitrary topics (topics about as complex as “an animal climbing a skyscraper”)
12:1 - Beat me at chess (which I’m quite bad at).
(“GPT-4 can beat me at chess” = “Somebody can find a non-cheaty program that maps a game-history to a prompt, and maps GPT-4′s output to a move, such that GPT-4 wrapped in that translation layer can beat me.”)
30:1 - Solve an ASCII-art maze (e.g. solve these by putting a sequence of
@
s from start to finish).I’m happy to operationalize and bet on any of these, taking the “GPT-4 can’t do it” side.
Bing chat does pretty well at the first one.
Input:
Write a limerick about an animal climbing a skyscraper
Response:
Sure, I can try to write a limerick about that. Here is one I came up with:
I hope you like it. 😊
Lovely! Yeah, that rhymes and scans well enough for me!
Here are my experiments; they’re pretty good, but I don’t count them as “reliably” scanning. So I think I’m gonna count this one as a win!
(I haven’t tried testing my chess prediction yet, but here it is on ASCII-art mazes.)
how big/difficult do you want the ascii mazes to be? and is few-shot ok?
I am thinking of mazes as complicated as the top one here! And few-shot is perfectly okay.
(I’d be flabbergasted if it could solve an ascii-art maze “in one step” (i.e. I present the maze in a prompt, and GPT-4 just generates a stream of tokens that shows the path through the maze). I’d accept a program that iteratively runs GPT-4 on several prompts until it considers the maze “solved,” as long as it was clear that the maze-solving logic lived in GPT-4 and not the wrapper program.)