I agree that there are some impressive improvements from GPT-3 to GPT-4. But they seem to me a lot less impressive than jump from GPT-2 producing barely coherent texts to GPT-3 (somewhat) figuring out how to play chess.
I disagree with you take on LLM’s math abilities. Wolfram Alpha helps with tasks like SAT—and GPT-4 is doing well enough on them. But for some reason it (at least in the incarnation of Bing) has trouble with simple logic puzzles like the one I mentioned in other comment.
Can you tell more about success with theoretical physics concepts? I don’t think I’ve seen anybody try that.
Not coherently, no. My girlfriend is a theoretical physics and theory of machine learning prof, my understanding of her work is extremely fuzzy. But she was stuck on something where I was being a rubber ducky, which is tricky insofar as I barely understand what she does, and I proposed talking to ChatGPT. She basically entered the problem she was stuck on (her suspicion that two different things were related somehow, though she couldn’t quite pinpoint how). It took some tweaking—at first, it was super superficial, giving an explanation more suited for wikipedia or school homework than getting to the actual science, she needed to push it over and over to finally get equations and not just superficial explanations, unconnected. And at the time, the internet plugin was not out, so the lack of access to recent papers was a problem. But she said eventually, it spat out some accurate equations (though also the occasional total nonsense), made a bunch of connections between concepts that were accurate (though it could not always correctly identify why), and made some proposals for connections that she at least found promising. She was very intrigued by its ability to spot those connections; in some ways, it seemed to replicate the intuition an advanced physicist eventually obtains. She compared the experience with talking to an A-star Bachelor student who has memorised all the concepts and is very well read, but if you start prodding, often has not truly understood them; and yet suddenly makes some connections that should be vastly beyond them, though unable to properly explain why. She still found it helpful and interesting. - I am still under the impression it does much worse in this area than in e.g. biology or computer science.
With the logic puzzle, the technical report on ChatGPT4 also seems confused at that. They had some logic puzzles which ChatGPT failed at, and where they got worse and worse with each iteration, only to suddenly learn them with no warning. I haven’t spotted the pattern yet, but can only say that it reminds me strongly of mistakes you see in young children lacking advanced theory of mind and time perception. E.g. it has huge difficulties with the idea that it needs to judge a past situation even though it now has more knowledge without getting biased by it, or that it needs to have knowledge but withhold it from another person to win a game. As humans, we tend to forget that these are very advanced skills, because we excel at them so.
I agree that there are some impressive improvements from GPT-3 to GPT-4. But they seem to me a lot less impressive than jump from GPT-2 producing barely coherent texts to GPT-3 (somewhat) figuring out how to play chess.
I disagree with you take on LLM’s math abilities. Wolfram Alpha helps with tasks like SAT—and GPT-4 is doing well enough on them. But for some reason it (at least in the incarnation of Bing) has trouble with simple logic puzzles like the one I mentioned in other comment.
Can you tell more about success with theoretical physics concepts? I don’t think I’ve seen anybody try that.
Not coherently, no. My girlfriend is a theoretical physics and theory of machine learning prof, my understanding of her work is extremely fuzzy. But she was stuck on something where I was being a rubber ducky, which is tricky insofar as I barely understand what she does, and I proposed talking to ChatGPT. She basically entered the problem she was stuck on (her suspicion that two different things were related somehow, though she couldn’t quite pinpoint how). It took some tweaking—at first, it was super superficial, giving an explanation more suited for wikipedia or school homework than getting to the actual science, she needed to push it over and over to finally get equations and not just superficial explanations, unconnected. And at the time, the internet plugin was not out, so the lack of access to recent papers was a problem. But she said eventually, it spat out some accurate equations (though also the occasional total nonsense), made a bunch of connections between concepts that were accurate (though it could not always correctly identify why), and made some proposals for connections that she at least found promising. She was very intrigued by its ability to spot those connections; in some ways, it seemed to replicate the intuition an advanced physicist eventually obtains. She compared the experience with talking to an A-star Bachelor student who has memorised all the concepts and is very well read, but if you start prodding, often has not truly understood them; and yet suddenly makes some connections that should be vastly beyond them, though unable to properly explain why. She still found it helpful and interesting. - I am still under the impression it does much worse in this area than in e.g. biology or computer science.
With the logic puzzle, the technical report on ChatGPT4 also seems confused at that. They had some logic puzzles which ChatGPT failed at, and where they got worse and worse with each iteration, only to suddenly learn them with no warning. I haven’t spotted the pattern yet, but can only say that it reminds me strongly of mistakes you see in young children lacking advanced theory of mind and time perception. E.g. it has huge difficulties with the idea that it needs to judge a past situation even though it now has more knowledge without getting biased by it, or that it needs to have knowledge but withhold it from another person to win a game. As humans, we tend to forget that these are very advanced skills, because we excel at them so.