I happen to be a doctor with an interest in LW and associated concerns, who discovered a love for ML far too late for me to reskill and embrace it.
My younger cousin is a mathematician currently doing an integrated Masters and PhD. About a year back, I’d been trying to demonstrate to him the every increasing capability of SOTA LLMs at maths, and asked him to raise questions that it couldn’t trivially answer.
He chose “is the one-point compactification of a Hausdorff space itself Hausdorff?”.
At the time, all the models insisted invariably that that’s a no. I ran the prompt multiple times on the best models available then. My cousin said it was incorrect, and provided to sketch out a proof (which was quite simple when I finally understood that much of the jargon represented rather simple ideas at their core).
I ran into him again when we’re both visiting home, and I decided to run the same question through the latest models to gauge their improvements.
I tried Gemini 1206, Gemini Flash Thinking Experimental, Claude 3.5 Sonnet (New) and GPT-4o.
Other than reinforcing the fact that AI companies have abysmal naming schemes, to my surprise almost all of them gave the correct answer, barring Claude, but it was hampered by Anthropic being cheapskates and turning on the concise responses mode.
I showed him how the extended reasoning worked for Gemini Flash (it doesn’t hide its thinking tokens unlike o1) and I could tell that he was shocked/impressed, and couldn’t fault the reasoning process it and the other models went through.
To further shake him up, I had him find some recent homework problems he’d been assigned at his course (he’s in a top 3 maths program in India) and used the multimodality inherent in Gemini to just take a picture of an extended question and ask it to solve it.* It did so, again, flawlessly.
*So I wouldn’t have to go through the headache of reproducing it in latex or markdown.
He then demanded we try with another, and this time he expressed doubts that the model could handle a compact, yet vague in the absence of context not presented problem, and no surprises again.
He admitted that this was the first time he took my concerns seriously, though getting a rib in by saying doctors would be off the job market before mathematicians. I conjectured that was unlikely, given that maths and CS performance are more immediately beneficial to AI companies as they are easier to drop-in and automate, while also having direct benefits for ML, with the goal of replacing human programmers and having the models recursively self-improve. Not to mention that performance in those domains is easier to make superhuman with the use of RL and automated theorem providers for ground truth. Oh well, I reassured him, we’re probably all screwed and in short order, to the point where there’s not much benefit in quibbling about the other’s layoffs being a few months later.
I similarly felt in the past that by the time computers were pareto-better than I at math, there would already be mass-layoffs. I no longer believe this to be the case at all, and have been thinking about how I should orient myself in the future. I was very fortunate to land an offer for an applied-math research job in the next few months, but my plan is to devote a lot more energy to networking + building people skills while I’m there instead of just hyperfocusing on learning the relevant fields.
o1 (standard, not pro) is still not the best at math reasoning though. I occasionally give it linear algebra lemmas that I suspect it to be able to help with, but it always has major errors. Here are some examples:
I have a finite-dimensional real vector space V equipped with a symmetric bilinear form (⋅,⋅) which is not necessarily non-degenerate. Let n be the dimension of V, K be the subspace of V with (K,V)=0, and k be the dimension of K. Let W1 and W2 be n+k dimensional real vector spaces that contain V and are equipped with symmetric non-degenerate bilinear forms that extend (⋅,⋅). Show that there exists an isometry from W1 to W2. To its credit, it gave me some references that helped me prove this, but its argument was completely bogus.
Let V be a real finite-dimensinoal vector space equipped with a symmetric non-degenerate bilinear form (⋅,⋅) and let σ be an isometry of V. Prove or disprove that the restriction of (⋅,⋅) to the fixed-point subspace of σ on V is non-degenerate. (Here it sort of had the right idea but its counter-examples were never right).
Does there exist a symmetric irreducible square matrix with diagonal entries 2 and non-positive integer off-diagonal entries such that the corank is more than 1? Here it gave a completely wrong proof of “no” and, no matter how many times I corrected its errors, kept gaslighting me into believing that the general idea must work and that it’s a standard result in the field that it follows from a book that I happened to actually have read. It kept insisting this, no matter how many times I corrected its errors, until I presented with an example of a corank-1 matrix that made it clear that its idea was unfixable.
I have a strong suspicion that o3 will be much better than o1 though.
Thank you for your insight. Out of idle curiosity, I tried putting your last query into Gemini 2 Flash Thinking Experimental and it told me yes first-shot.
Here’s the final output, it’s absolutely beyond my ability to evaluate, so I’m curious if you think it went about it correctly. I can also share the full COT if you’d like, but it’s lengthy:
corank has to be more than 1, not equal to 1. I’m not sure if such a matrix exists; the reason I was able to change its mind by supplying a corank-1 matrix was that its kernel behaved in a way that significantly violated its intuition.
I decided to run the same question through the latest models to gauge their improvements.
Not exactly sure if there is much advantage at all in you having done this, but I feel inclined to say Thank You for persisting in persuading your cousin to at least consider concerns regarding AI, even if he perceptually filters those concerns to mostly regard job automation over others, such as a global catastrophe.
In my own life, over the last several years, I have found it difficult to persuade those close to me to really consider concerns from AI.
I thought that capabilities advancing observably before them might stoke them to think more about their own future and how possibly to behave and or live differently conditional on different AI capabilities, but this has been of little avail.
Expanding capabilities seem to best dissolve skepticism but conversations seem to have not had as large an effect as I would have expected. I’ve not thought or acted as much as I want to on how to coordinate more of humanity around decision-making regarding AI (or the consequences of AI), partially since I do not have a concrete notion where to steer humanity or justification for where to steer (even I knew it was highly likely I was actually contributing to the steering through my actions).
I happen to be a doctor with an interest in LW and associated concerns, who discovered a love for ML far too late for me to reskill and embrace it.
My younger cousin is a mathematician currently doing an integrated Masters and PhD. About a year back, I’d been trying to demonstrate to him the every increasing capability of SOTA LLMs at maths, and asked him to raise questions that it couldn’t trivially answer.
He chose “is the one-point compactification of a Hausdorff space itself Hausdorff?”.
At the time, all the models insisted invariably that that’s a no. I ran the prompt multiple times on the best models available then. My cousin said it was incorrect, and provided to sketch out a proof (which was quite simple when I finally understood that much of the jargon represented rather simple ideas at their core).
I ran into him again when we’re both visiting home, and I decided to run the same question through the latest models to gauge their improvements.
I tried Gemini 1206, Gemini Flash Thinking Experimental, Claude 3.5 Sonnet (New) and GPT-4o.
Other than reinforcing the fact that AI companies have abysmal naming schemes, to my surprise almost all of them gave the correct answer, barring Claude, but it was hampered by Anthropic being cheapskates and turning on the concise responses mode.
I showed him how the extended reasoning worked for Gemini Flash (it doesn’t hide its thinking tokens unlike o1) and I could tell that he was shocked/impressed, and couldn’t fault the reasoning process it and the other models went through.
To further shake him up, I had him find some recent homework problems he’d been assigned at his course (he’s in a top 3 maths program in India) and used the multimodality inherent in Gemini to just take a picture of an extended question and ask it to solve it.* It did so, again, flawlessly.
*So I wouldn’t have to go through the headache of reproducing it in latex or markdown.
He then demanded we try with another, and this time he expressed doubts that the model could handle a compact, yet vague in the absence of context not presented problem, and no surprises again.
He admitted that this was the first time he took my concerns seriously, though getting a rib in by saying doctors would be off the job market before mathematicians. I conjectured that was unlikely, given that maths and CS performance are more immediately beneficial to AI companies as they are easier to drop-in and automate, while also having direct benefits for ML, with the goal of replacing human programmers and having the models recursively self-improve. Not to mention that performance in those domains is easier to make superhuman with the use of RL and automated theorem providers for ground truth. Oh well, I reassured him, we’re probably all screwed and in short order, to the point where there’s not much benefit in quibbling about the other’s layoffs being a few months later.
I similarly felt in the past that by the time computers were pareto-better than I at math, there would already be mass-layoffs. I no longer believe this to be the case at all, and have been thinking about how I should orient myself in the future. I was very fortunate to land an offer for an applied-math research job in the next few months, but my plan is to devote a lot more energy to networking + building people skills while I’m there instead of just hyperfocusing on learning the relevant fields.
o1 (standard, not pro) is still not the best at math reasoning though. I occasionally give it linear algebra lemmas that I suspect it to be able to help with, but it always has major errors. Here are some examples:
I have a finite-dimensional real vector space V equipped with a symmetric bilinear form (⋅,⋅) which is not necessarily non-degenerate. Let n be the dimension of V, K be the subspace of V with (K,V)=0, and k be the dimension of K. Let W1 and W2 be n+k dimensional real vector spaces that contain V and are equipped with symmetric non-degenerate bilinear forms that extend (⋅,⋅). Show that there exists an isometry from W1 to W2. To its credit, it gave me some references that helped me prove this, but its argument was completely bogus.
Let V be a real finite-dimensinoal vector space equipped with a symmetric non-degenerate bilinear form (⋅,⋅) and let σ be an isometry of V. Prove or disprove that the restriction of (⋅,⋅) to the fixed-point subspace of σ on V is non-degenerate. (Here it sort of had the right idea but its counter-examples were never right).
Does there exist a symmetric irreducible square matrix with diagonal entries 2 and non-positive integer off-diagonal entries such that the corank is more than 1? Here it gave a completely wrong proof of “no” and, no matter how many times I corrected its errors, kept gaslighting me into believing that the general idea must work and that it’s a standard result in the field that it follows from a book that I happened to actually have read. It kept insisting this, no matter how many times I corrected its errors, until I presented with an example of a corank-1 matrix that made it clear that its idea was unfixable.
I have a strong suspicion that o3 will be much better than o1 though.
Thank you for your insight. Out of idle curiosity, I tried putting your last query into Gemini 2 Flash Thinking Experimental and it told me yes first-shot.
Here’s the final output, it’s absolutely beyond my ability to evaluate, so I’m curious if you think it went about it correctly. I can also share the full COT if you’d like, but it’s lengthy:
https://ibb.co/album/rx5Dy1
(Image since even copying the markdown renders it ugly here)
corank has to be more than 1, not equal to 1. I’m not sure if such a matrix exists; the reason I was able to change its mind by supplying a corank-1 matrix was that its kernel behaved in a way that significantly violated its intuition.
Not exactly sure if there is much advantage at all in you having done this, but I feel inclined to say Thank You for persisting in persuading your cousin to at least consider concerns regarding AI, even if he perceptually filters those concerns to mostly regard job automation over others, such as a global catastrophe.
In my own life, over the last several years, I have found it difficult to persuade those close to me to really consider concerns from AI.
I thought that capabilities advancing observably before them might stoke them to think more about their own future and how possibly to behave and or live differently conditional on different AI capabilities, but this has been of little avail.
Expanding capabilities seem to best dissolve skepticism but conversations seem to have not had as large an effect as I would have expected. I’ve not thought or acted as much as I want to on how to coordinate more of humanity around decision-making regarding AI (or the consequences of AI), partially since I do not have a concrete notion where to steer humanity or justification for where to steer (even I knew it was highly likely I was actually contributing to the steering through my actions).