Hi. I’m Gareth McCaughan. I’ve been a consistent reader and occasional commenter since the Overcoming Bias days. My LW username is “gjm” (not “Gjm” despite the wiki software’s preference for that capitalization). Elsewehere I generally go by one of “g”, “gjm”, or “gjm11”. The URL listed here is for my website and blog, neither of which has been substantially updated for several years. I live near Cambridge (UK) and work for Hewlett-Packard (who acquired the company that acquired what remained of the small company I used to work for, after they were acquired by someone else). My business cards say “mathematician” but in practice my work is a mixture of simulation, data analysis, algorithm design, software development, problem-solving, and whatever random engineering no one else is doing. I am married and have a daughter born in mid-2006. The best way to contact me is by email: firstname dot lastname at pobox dot com. I am happy to be emailed out of the blue by interesting people. If you are an LW regular you are probably an interesting person in the relevant sense even if you think you aren’t.
If you’re wondering why some of my very old posts and comments are at surprisingly negative scores, it’s because for some time I was the favourite target of old-LW’s resident neoreactionary troll, sockpuppeteer and mass-downvoter.
It’s pretty good. I tried it on a few mathematical questions.
First of all, a version of the standard AIW problem from the recent “Alice in Wonderland” paper. It got this right (not very surprisingly as other leading models also do, at least much of the time). Then a version of the “AIW+” problem which is much more confusing. Its answer was wrong, but its method (which it explained) was pretty much OK and I am not sure it was any wronger than I would be on average trying to answer that question in real time.
Then some more conceptual mathematical puzzles. I took them from recent videos on Michael Penn’s YouTube channel. (His videos are commonly about undergraduate or easyish-olympiad-style pure mathematics. They seem unlikely to be in Claude’s training data, though of course other things containing the same problems might be.)
One pretty straightforward one: how many distinct factorials can you find that all end in the same number of zeros? It wrote down the correct formula for the number of zeros, then started enumerating particular numbers and got some things wrong, tried to do pattern-spotting, and gave a hilariously wrong answer; when gently nudged, it corrected itself kinda-adequately and gave an almost-correct answer (which it corrected properly when nudged again) but I didn’t get much feeling of real understanding.
Another (an exercise from Knuth’s TAOCP; he rates its difficulty HM22, meaning it needs higher mathematics and should take you 25 minutes or so; it’s about the relationship between two functions whose Taylor series coefficients differ by a factor H(n), the n’th harmonic number) it solved straight off and quite neatly.
Another (find all functions with (f(x)-f(y))/(x-y) = f’((x+y)/2) for all distinct x,y) it initially “solved” with a solution with a completely invalid step. When I said I couldn’t follow that step, it gave a fairly neat solution that works if you assume f is real-analytic (has a Taylor series expansion everywhere). This is also the first thing that occurred to me when I thought about the problem. When asked for a solution that doesn’t make that assumption, it unfortunately gave another invalid solution, and when prodded about that it gave another invalid one. Further prompting, even giving it a pretty big hint in the direction of a nice neat solution (better than Penn’s :-)), didn’t manage to produce a genuinely correct solution.
I rate it “not terribly good undergraduate at a good university”, I think, but—as with all these models to date—with tragically little “self-awareness”, in the sense that it’ll give a wrong answer, and you’ll poke it, and it’ll apologize effusively and give another wrong answer, and you can repeat this several times without making it change its approach or say “sorry, it seems I’m just not smart enough to solve this one” or anything.
On the one hand, the fact that we have AI systems that can do mathematics about as well as a not-very-good undergraduate (and quite a bit faster) is fantastically impressive. On the other hand, it really does feel as if something fairly fundamental is missing. If I were teaching an actual undergraduate whose answers were like Claude’s, I’d worry that there was something wrong with their brain that somehow had left them kinda able to do mathematics. I wouldn’t bet heavily that just continuing down the current path won’t get us to “genuinely smart people really thinking hard with actual world models” levels of intelligence in the nearish future, but I think that’s still the way I’d bet.
(Of course a system that’s at the “not very good undergraduate” level in everything, which I’m guessing is roughly what this is, is substantially superhuman in some important respects. And I don’t intend to imply that it doesn’t matter whether Anthropic are lax about what they release just because the latest thing happens not to be smart enough to be particularly dangerous.)