Look, if anyone here truly thinks I cannot tell a human from an AI, I’ll happily take your money. Name your terms. I can stake up to $1000 on this if you wish. We’d need a way to ensure the human subject isn’t trying to pass for an AI to steal my money, though (I have no doubt humans can pretend to be machines, it’s the other way around that’s in question).
It’s not even gonna be close, and I’m tired of you guys pretending otherwise. For instance, Jefftk’s explanation below clearly makes sense, while every explanation I got out of chatGPT made no sense. So Jefftk would in fact pass my Turing test, even if he said “ellipse”, which he probably wouldn’t have as it wasn’t one of the 4 answers I asked for.
Actually trying to answer: “I set the string swinging like a pendulum” to me reads like the person pulls the ice cube back and then either lets go or gives it a little push. I expect it’s quite hard to do either of these while ensuring that the net momentum of the ice cube is exactly along a line that runs directly below the point at which the ice cube is attached to the branch. If it starts off with any momentum perpendicular to that line, you get an ellipse and not a line. As it loses energy and traverses a smaller ellipse it fills in the ellipse. If this happens quickly enough the final shape would be less of an ellipse than a splattering of drips in a vaguely elliptical pattern, with a strong concentration in the center. The cooler the day the more that happens, and possibly the day needs to be improbably hot before you get anything other than a few dots and a point?
Also, from the mechanical, historical perspective—a drop that landed at the dead center beneath the pendulum’s contact with the branch would have had to leave the cube in a brief moment of time before passing over the center, with exactly enough forward velocity at the moment it left the cube such that it would hit the center by the time it reached the ground (depends on how far up it’s hung)… which is a tiny portion of total drips, I assume?
If you cannot interpret the question as having one of those 4 answers, I accuse you of being willfully dense.
Regardless, let me just concede the question is bad and move on. I already won the Turing test with the cow question, and I’ve subsequently found chatGPT fails on even much easier geometry questions (in 2d, not 3d). I can give you examples if you wish, but only if you say “I am debating in good faith and truly don’t think there are simple geometry problems chatGPT cannot solve”.
(See, I don’t think you disagree that chatGPT is bad at geometric reasoning, I think you’re just trying to nitpick.)
If you cannot interpret the question as having one of those 4 answers, I accuse you of being willfully dense.
Giving the right answer to the best of your ability even when it is not one the questioner anticipates is how I answer questions, and how I think people should generally answer these kinds of questions.
I can give you examples if you wish, but only if you say “I am debating in good faith and truly don’t think there are simple geometry problems chatGPT cannot solve”.
I’m debating a good faith, yes. I don’t think it’s as meaningful as you think that you can find simple geometry problems that GPT cannot solve, however, because I’d predict a lot of people would also get the question wrong.
Unless you’ve tried giving “simple” questions to typical adults, it’s easy to overestimate how good human responses would be, comparing the AI answers to “ideal” instead of “real”.
“What’s the maximum possible number of intersection points between a circle and a triangle?”
(chatGPT says 3.) OK, your turn, tell me all about how normal humans cannot solve it, or how you personally interpret the question in a weird way so that the answer is 17.
The number that immediately came to mind was ‘three’. After thinking harder, and seeing that you had said chatGPT says ‘three’, I realized it’s ‘six’.
My prediction, if you asked, random adults, is that ‘three’ would be the most common answer:
Many of won’t be picturing something concrete or thinking about it hard, and will intuitively say a number. A lot of these will say ‘three’, because triangles are very three.
Some will imagine a circumscribed or inscribed triangle and say ‘three’.
Some will imagine a case where the correct answer is ‘six’ but will still think of it as three intersections. (This is where I was until I thought harder.)
Do you disagree? If you do, maybe we could run a Mechanical Turk survey to check?
EDIT: one of my housemates said ‘six’, and my 8yo said ‘three’.
Many won’t think about it very hard, but the interesting case of the Turing test is when you compare to a human who is trying. If you opened up a chat with random strangers, the most common answer to my question would be “lol”. That’s easy for a computer to simulate: just answer “lol” to everything.
The whole point here is that chatGPT cannot reason like a human. I don’t care that survey-fillers on MTurk are answering questions as fast as possible with no regards for whether their answers are correct; I care about capabilities of humans, not capabilities when the humans are not trying and don’t feel like thinking about the problem.
How about this: suppose I put this question as a bonus question next time I give an in-person exam to my undergraduates. How many do you think will get it wrong?
I think undergraduates are better at reasoning than typical humans. Whether they get it right probably depends on the subject: what kind of classes do you teach?
(My guess here is that a lot of humans wouldn’t meet your requirements for ability to reason like a human)
I’m concerned that when the AI is at the level of an undergraduate and can get 95% of things right, and can be sped up 100x faster than a human and scaled by more servers, it’s going to be too late.
I don’t really like the attempts to convince me that chatGPT is impressive by telling me how dumb people are. You should aspire to tell me how smart chatGPT is, not how dumb people are.
The argumentative move “well, I could solve the problem, but the problem is still bad because the average person can’t” is grating. It is grating even if you end up being right (I’m not sure). It’s grating because you have such a low esteem for humanity, but at the same time you try to impress me with how chatGPT can match those humans you think so little of. You are trying to convince me of BOTH “most humans are idiots” AND “it is super impressive and scary that chatGPT can match those idiots” at the same time.
Anyway, perhaps we are soon nearing the point where no simple 1-prompt IQ-type question can distinguish an average human from an AI. Even then, an interactive 5-minute conversation will still do so. The AI failed even the cow question, remember? The one your kids succeeded at? Now, perhaps that was a fluke, but if you give me 5 minutes of conversation time I’ll be able to generate more such flukes.
Also, in specific subject matters, it once again becomes easy to distinguish chatGPT from a human expert (or even an undergraduate student, usually). It’s harder in the humanities, granted, but it’s trivial in the sciences, and even in the humanities, the arguments of LLMs have this not-quite-making-sense property I observed when I asked Charlotte if she’s sentient.
I don’t really like the attempts to convince me that chatGPT is impressive by telling me how dumb people are.
Thanks for flagging this! I’m not trying to convince you that chatGPT is impressive, I’m only trying to convince you that you’re overestimating how smart people are.
OK, fair enough. I think LWers underestimate how smart average people are (that is, they overestimate their own relative intelligence), and I try to be mindful of that cognitive bias, but it’s possible I’m overcorrecting for this.
Why isn’t the correct answer an ellipse? (Ignoring the rotation of the earth)
Oops, @jefftk just casually failed @LGS’s Turing test :) Regardless of what the correct answer is
Look, if anyone here truly thinks I cannot tell a human from an AI, I’ll happily take your money. Name your terms. I can stake up to $1000 on this if you wish. We’d need a way to ensure the human subject isn’t trying to pass for an AI to steal my money, though (I have no doubt humans can pretend to be machines, it’s the other way around that’s in question).
It’s not even gonna be close, and I’m tired of you guys pretending otherwise. For instance, Jefftk’s explanation below clearly makes sense, while every explanation I got out of chatGPT made no sense. So Jefftk would in fact pass my Turing test, even if he said “ellipse”, which he probably wouldn’t have as it wasn’t one of the 4 answers I asked for.
Actually trying to answer: “I set the string swinging like a pendulum” to me reads like the person pulls the ice cube back and then either lets go or gives it a little push. I expect it’s quite hard to do either of these while ensuring that the net momentum of the ice cube is exactly along a line that runs directly below the point at which the ice cube is attached to the branch. If it starts off with any momentum perpendicular to that line, you get an ellipse and not a line. As it loses energy and traverses a smaller ellipse it fills in the ellipse. If this happens quickly enough the final shape would be less of an ellipse than a splattering of drips in a vaguely elliptical pattern, with a strong concentration in the center. The cooler the day the more that happens, and possibly the day needs to be improbably hot before you get anything other than a few dots and a point?
Slight adjustment to your scenario:
the ice-cube’s residence-times are maximized at the extrema, so your drips would concentrate toward the two extremes.
Also, from the mechanical, historical perspective—a drop that landed at the dead center beneath the pendulum’s contact with the branch would have had to leave the cube in a brief moment of time before passing over the center, with exactly enough forward velocity at the moment it left the cube such that it would hit the center by the time it reached the ground (depends on how far up it’s hung)… which is a tiny portion of total drips, I assume?
Because that’s not one of the 4 options.
(Technically a line segment is a special case of an ellipse)
If you add 1 and 2 do you get 2, 4, or 6?
Humans often give answers that aren’t on a list if they think the list is wrong.
If you cannot interpret the question as having one of those 4 answers, I accuse you of being willfully dense.
Regardless, let me just concede the question is bad and move on. I already won the Turing test with the cow question, and I’ve subsequently found chatGPT fails on even much easier geometry questions (in 2d, not 3d). I can give you examples if you wish, but only if you say “I am debating in good faith and truly don’t think there are simple geometry problems chatGPT cannot solve”.
(See, I don’t think you disagree that chatGPT is bad at geometric reasoning, I think you’re just trying to nitpick.)
“I already won the Turing test with the cow question”
I would not be surprised if ChatGPT could come up with a more human-sounding question than your cow and ice cube. You might not pass, comparatively.
Huh? I’m the tester, not the testee. I’m not trying to pass for human, I’m trying to discern if the person I’m chatting with is human.
What’s with people saying LLMs pass the Turing test? They are not close you guys, come on.
Giving the right answer to the best of your ability even when it is not one the questioner anticipates is how I answer questions, and how I think people should generally answer these kinds of questions.
I’m debating a good faith, yes. I don’t think it’s as meaningful as you think that you can find simple geometry problems that GPT cannot solve, however, because I’d predict a lot of people would also get the question wrong.
Unless you’ve tried giving “simple” questions to typical adults, it’s easy to overestimate how good human responses would be, comparing the AI answers to “ideal” instead of “real”.
“What’s the maximum possible number of intersection points between a circle and a triangle?”
(chatGPT says 3.) OK, your turn, tell me all about how normal humans cannot solve it, or how you personally interpret the question in a weird way so that the answer is 17.
The number that immediately came to mind was ‘three’. After thinking harder, and seeing that you had said chatGPT says ‘three’, I realized it’s ‘six’.
My prediction, if you asked, random adults, is that ‘three’ would be the most common answer:
Many of won’t be picturing something concrete or thinking about it hard, and will intuitively say a number. A lot of these will say ‘three’, because triangles are very three.
Some will imagine a circumscribed or inscribed triangle and say ‘three’.
Some will imagine a case where the correct answer is ‘six’ but will still think of it as three intersections. (This is where I was until I thought harder.)
Do you disagree? If you do, maybe we could run a Mechanical Turk survey to check?
EDIT: one of my housemates said ‘six’, and my 8yo said ‘three’.
Many won’t think about it very hard, but the interesting case of the Turing test is when you compare to a human who is trying. If you opened up a chat with random strangers, the most common answer to my question would be “lol”. That’s easy for a computer to simulate: just answer “lol” to everything.
The whole point here is that chatGPT cannot reason like a human. I don’t care that survey-fillers on MTurk are answering questions as fast as possible with no regards for whether their answers are correct; I care about capabilities of humans, not capabilities when the humans are not trying and don’t feel like thinking about the problem.
How about this: suppose I put this question as a bonus question next time I give an in-person exam to my undergraduates. How many do you think will get it wrong?
I think undergraduates are better at reasoning than typical humans. Whether they get it right probably depends on the subject: what kind of classes do you teach?
(My guess here is that a lot of humans wouldn’t meet your requirements for ability to reason like a human)
I’m concerned that when the AI is at the level of an undergraduate and can get 95% of things right, and can be sped up 100x faster than a human and scaled by more servers, it’s going to be too late.
I don’t really like the attempts to convince me that chatGPT is impressive by telling me how dumb people are. You should aspire to tell me how smart chatGPT is, not how dumb people are.
The argumentative move “well, I could solve the problem, but the problem is still bad because the average person can’t” is grating. It is grating even if you end up being right (I’m not sure). It’s grating because you have such a low esteem for humanity, but at the same time you try to impress me with how chatGPT can match those humans you think so little of. You are trying to convince me of BOTH “most humans are idiots” AND “it is super impressive and scary that chatGPT can match those idiots” at the same time.
Anyway, perhaps we are soon nearing the point where no simple 1-prompt IQ-type question can distinguish an average human from an AI. Even then, an interactive 5-minute conversation will still do so. The AI failed even the cow question, remember? The one your kids succeeded at? Now, perhaps that was a fluke, but if you give me 5 minutes of conversation time I’ll be able to generate more such flukes.
Also, in specific subject matters, it once again becomes easy to distinguish chatGPT from a human expert (or even an undergraduate student, usually). It’s harder in the humanities, granted, but it’s trivial in the sciences, and even in the humanities, the arguments of LLMs have this not-quite-making-sense property I observed when I asked Charlotte if she’s sentient.
Thanks for flagging this! I’m not trying to convince you that chatGPT is impressive, I’m only trying to convince you that you’re overestimating how smart people are.
OK, fair enough. I think LWers underestimate how smart average people are (that is, they overestimate their own relative intelligence), and I try to be mindful of that cognitive bias, but it’s possible I’m overcorrecting for this.