If you cannot interpret the question as having one of those 4 answers, I accuse you of being willfully dense.
Regardless, let me just concede the question is bad and move on. I already won the Turing test with the cow question, and I’ve subsequently found chatGPT fails on even much easier geometry questions (in 2d, not 3d). I can give you examples if you wish, but only if you say “I am debating in good faith and truly don’t think there are simple geometry problems chatGPT cannot solve”.
(See, I don’t think you disagree that chatGPT is bad at geometric reasoning, I think you’re just trying to nitpick.)
If you cannot interpret the question as having one of those 4 answers, I accuse you of being willfully dense.
Giving the right answer to the best of your ability even when it is not one the questioner anticipates is how I answer questions, and how I think people should generally answer these kinds of questions.
I can give you examples if you wish, but only if you say “I am debating in good faith and truly don’t think there are simple geometry problems chatGPT cannot solve”.
I’m debating a good faith, yes. I don’t think it’s as meaningful as you think that you can find simple geometry problems that GPT cannot solve, however, because I’d predict a lot of people would also get the question wrong.
Unless you’ve tried giving “simple” questions to typical adults, it’s easy to overestimate how good human responses would be, comparing the AI answers to “ideal” instead of “real”.
“What’s the maximum possible number of intersection points between a circle and a triangle?”
(chatGPT says 3.) OK, your turn, tell me all about how normal humans cannot solve it, or how you personally interpret the question in a weird way so that the answer is 17.
The number that immediately came to mind was ‘three’. After thinking harder, and seeing that you had said chatGPT says ‘three’, I realized it’s ‘six’.
My prediction, if you asked, random adults, is that ‘three’ would be the most common answer:
Many of won’t be picturing something concrete or thinking about it hard, and will intuitively say a number. A lot of these will say ‘three’, because triangles are very three.
Some will imagine a circumscribed or inscribed triangle and say ‘three’.
Some will imagine a case where the correct answer is ‘six’ but will still think of it as three intersections. (This is where I was until I thought harder.)
Do you disagree? If you do, maybe we could run a Mechanical Turk survey to check?
EDIT: one of my housemates said ‘six’, and my 8yo said ‘three’.
Many won’t think about it very hard, but the interesting case of the Turing test is when you compare to a human who is trying. If you opened up a chat with random strangers, the most common answer to my question would be “lol”. That’s easy for a computer to simulate: just answer “lol” to everything.
The whole point here is that chatGPT cannot reason like a human. I don’t care that survey-fillers on MTurk are answering questions as fast as possible with no regards for whether their answers are correct; I care about capabilities of humans, not capabilities when the humans are not trying and don’t feel like thinking about the problem.
How about this: suppose I put this question as a bonus question next time I give an in-person exam to my undergraduates. How many do you think will get it wrong?
I think undergraduates are better at reasoning than typical humans. Whether they get it right probably depends on the subject: what kind of classes do you teach?
(My guess here is that a lot of humans wouldn’t meet your requirements for ability to reason like a human)
I’m concerned that when the AI is at the level of an undergraduate and can get 95% of things right, and can be sped up 100x faster than a human and scaled by more servers, it’s going to be too late.
I don’t really like the attempts to convince me that chatGPT is impressive by telling me how dumb people are. You should aspire to tell me how smart chatGPT is, not how dumb people are.
The argumentative move “well, I could solve the problem, but the problem is still bad because the average person can’t” is grating. It is grating even if you end up being right (I’m not sure). It’s grating because you have such a low esteem for humanity, but at the same time you try to impress me with how chatGPT can match those humans you think so little of. You are trying to convince me of BOTH “most humans are idiots” AND “it is super impressive and scary that chatGPT can match those idiots” at the same time.
Anyway, perhaps we are soon nearing the point where no simple 1-prompt IQ-type question can distinguish an average human from an AI. Even then, an interactive 5-minute conversation will still do so. The AI failed even the cow question, remember? The one your kids succeeded at? Now, perhaps that was a fluke, but if you give me 5 minutes of conversation time I’ll be able to generate more such flukes.
Also, in specific subject matters, it once again becomes easy to distinguish chatGPT from a human expert (or even an undergraduate student, usually). It’s harder in the humanities, granted, but it’s trivial in the sciences, and even in the humanities, the arguments of LLMs have this not-quite-making-sense property I observed when I asked Charlotte if she’s sentient.
I don’t really like the attempts to convince me that chatGPT is impressive by telling me how dumb people are.
Thanks for flagging this! I’m not trying to convince you that chatGPT is impressive, I’m only trying to convince you that you’re overestimating how smart people are.
OK, fair enough. I think LWers underestimate how smart average people are (that is, they overestimate their own relative intelligence), and I try to be mindful of that cognitive bias, but it’s possible I’m overcorrecting for this.
If you add 1 and 2 do you get 2, 4, or 6?
Humans often give answers that aren’t on a list if they think the list is wrong.
If you cannot interpret the question as having one of those 4 answers, I accuse you of being willfully dense.
Regardless, let me just concede the question is bad and move on. I already won the Turing test with the cow question, and I’ve subsequently found chatGPT fails on even much easier geometry questions (in 2d, not 3d). I can give you examples if you wish, but only if you say “I am debating in good faith and truly don’t think there are simple geometry problems chatGPT cannot solve”.
(See, I don’t think you disagree that chatGPT is bad at geometric reasoning, I think you’re just trying to nitpick.)
“I already won the Turing test with the cow question”
I would not be surprised if ChatGPT could come up with a more human-sounding question than your cow and ice cube. You might not pass, comparatively.
Huh? I’m the tester, not the testee. I’m not trying to pass for human, I’m trying to discern if the person I’m chatting with is human.
What’s with people saying LLMs pass the Turing test? They are not close you guys, come on.
Giving the right answer to the best of your ability even when it is not one the questioner anticipates is how I answer questions, and how I think people should generally answer these kinds of questions.
I’m debating a good faith, yes. I don’t think it’s as meaningful as you think that you can find simple geometry problems that GPT cannot solve, however, because I’d predict a lot of people would also get the question wrong.
Unless you’ve tried giving “simple” questions to typical adults, it’s easy to overestimate how good human responses would be, comparing the AI answers to “ideal” instead of “real”.
“What’s the maximum possible number of intersection points between a circle and a triangle?”
(chatGPT says 3.) OK, your turn, tell me all about how normal humans cannot solve it, or how you personally interpret the question in a weird way so that the answer is 17.
The number that immediately came to mind was ‘three’. After thinking harder, and seeing that you had said chatGPT says ‘three’, I realized it’s ‘six’.
My prediction, if you asked, random adults, is that ‘three’ would be the most common answer:
Many of won’t be picturing something concrete or thinking about it hard, and will intuitively say a number. A lot of these will say ‘three’, because triangles are very three.
Some will imagine a circumscribed or inscribed triangle and say ‘three’.
Some will imagine a case where the correct answer is ‘six’ but will still think of it as three intersections. (This is where I was until I thought harder.)
Do you disagree? If you do, maybe we could run a Mechanical Turk survey to check?
EDIT: one of my housemates said ‘six’, and my 8yo said ‘three’.
Many won’t think about it very hard, but the interesting case of the Turing test is when you compare to a human who is trying. If you opened up a chat with random strangers, the most common answer to my question would be “lol”. That’s easy for a computer to simulate: just answer “lol” to everything.
The whole point here is that chatGPT cannot reason like a human. I don’t care that survey-fillers on MTurk are answering questions as fast as possible with no regards for whether their answers are correct; I care about capabilities of humans, not capabilities when the humans are not trying and don’t feel like thinking about the problem.
How about this: suppose I put this question as a bonus question next time I give an in-person exam to my undergraduates. How many do you think will get it wrong?
I think undergraduates are better at reasoning than typical humans. Whether they get it right probably depends on the subject: what kind of classes do you teach?
(My guess here is that a lot of humans wouldn’t meet your requirements for ability to reason like a human)
I’m concerned that when the AI is at the level of an undergraduate and can get 95% of things right, and can be sped up 100x faster than a human and scaled by more servers, it’s going to be too late.
I don’t really like the attempts to convince me that chatGPT is impressive by telling me how dumb people are. You should aspire to tell me how smart chatGPT is, not how dumb people are.
The argumentative move “well, I could solve the problem, but the problem is still bad because the average person can’t” is grating. It is grating even if you end up being right (I’m not sure). It’s grating because you have such a low esteem for humanity, but at the same time you try to impress me with how chatGPT can match those humans you think so little of. You are trying to convince me of BOTH “most humans are idiots” AND “it is super impressive and scary that chatGPT can match those idiots” at the same time.
Anyway, perhaps we are soon nearing the point where no simple 1-prompt IQ-type question can distinguish an average human from an AI. Even then, an interactive 5-minute conversation will still do so. The AI failed even the cow question, remember? The one your kids succeeded at? Now, perhaps that was a fluke, but if you give me 5 minutes of conversation time I’ll be able to generate more such flukes.
Also, in specific subject matters, it once again becomes easy to distinguish chatGPT from a human expert (or even an undergraduate student, usually). It’s harder in the humanities, granted, but it’s trivial in the sciences, and even in the humanities, the arguments of LLMs have this not-quite-making-sense property I observed when I asked Charlotte if she’s sentient.
Thanks for flagging this! I’m not trying to convince you that chatGPT is impressive, I’m only trying to convince you that you’re overestimating how smart people are.
OK, fair enough. I think LWers underestimate how smart average people are (that is, they overestimate their own relative intelligence), and I try to be mindful of that cognitive bias, but it’s possible I’m overcorrecting for this.