I was just talking with Bing about how quickly transformer AI might surpass human intelligence, and it was a sensible conversation until it hallucinated a nonexistent study in which GPT-4 was tested on 100 scenarios and dilemmas and performed badly.
What these interactions have in common, is that GPT-4 does well for a while, then goes off the rails. It makes me curious about the probability of going wrong—is there a constant risk per unit time, or does the risk per unit time actually increase with the length of the interaction, and if so, why?
The probability of going wrong increases as the novelty of the situation increases. As the chess game is played, the probability that the game is completely novel or literally never played before increases. Even more so at the amateur level. If a Grandmaster played GPT3/4, it’s going to go for much longer without going off the rails, simply because the first 20 something moves are likely played many times before and have been directly trained on.
Right, though 20 moves until a new game is very rare afaik (assuming the regular way of counting, where 1 move means one from both sides). But 15 is commonplace. According to chess.com (which I think only includes top games, though not sure) this one was new up from move 6 by white.
I was just talking with Bing about how quickly transformer AI might surpass human intelligence, and it was a sensible conversation until it hallucinated a nonexistent study in which GPT-4 was tested on 100 scenarios and dilemmas and performed badly.
What these interactions have in common, is that GPT-4 does well for a while, then goes off the rails. It makes me curious about the probability of going wrong—is there a constant risk per unit time, or does the risk per unit time actually increase with the length of the interaction, and if so, why?
The probability of going wrong increases as the novelty of the situation increases. As the chess game is played, the probability that the game is completely novel or literally never played before increases. Even more so at the amateur level. If a Grandmaster played GPT3/4, it’s going to go for much longer without going off the rails, simply because the first 20 something moves are likely played many times before and have been directly trained on.
Right, though 20 moves until a new game is very rare afaik (assuming the regular way of counting, where 1 move means one from both sides). But 15 is commonplace. According to chess.com (which I think only includes top games, though not sure) this one was new up from move 6 by white.