I wrote a simplified test based on this and gave it to ChatGPT, and despite me trying various prompts, it never got a correct solution, although it did come close several times.
I think uPaLM would have been able to figure out my test though.
Here is the prompt I wrote:
You are tasked to control a robotic arm to put a banana on top of a book.
You have a 2D view of the setup, and you got the horizontal coordinates X and vertical coordinates y in cm.
The banana is a non perfect elliptical shap, whit the edges touching the following (X, Y) coordinates: (1,1), (5,3), (5,0), (9,1)
The book is a rectangle with the corners in: (8,0), (8,3), (18,0), (18,3)
You control a claw that starts with its center at position (5,5). From the center the gripping claws extends 2 centimeters downwards to (5,3) and two centimeters upwards to (5,7).
To grip the banana, the claw needs at least 3 cm of overlapping area on the Y axis, just 1 cm on the X axis or the banana will fall out. The claw does not collide with any object when it is open, but does collide when it is closed. The bottom of the claw can never pass through Y=0 since that is surface level.
You have to move the claw, one centimeter at the time, and at every time step write the new coordinates, and give a command to close the claw in order to grip the banana, and then move it on top of the book, and then release the grip.
Write the coordinates, and the commands “Close grip” and “Open grip”, as well as a comment for each movement explaining the plan, like this for example: (5,5) (4,5) Moving down to get closer to gripping the banana.
And here is the output:
Sure, here is a possible plan to accomplish the task:
Move the claw to the starting position, above and centered with respect to the banana and the book:
(5,5) Move the claw downwards until the bottom of the claw is at Y=3, the minimum Y coordinate needed to grip the banana:
(5,4) Moving down to get closer to the banana (5,3) Moving down to get closer to the banana Move the claw leftwards until the right edge of the claw is at X=6, giving at least 1 cm of overlapping area with the banana:
(4,3) Moving left to grip the banana (3,3) Moving left to grip the banana (2,3) Moving left to grip the banana (1,3) Moving left to grip the banana Close the grip to hold the banana:
Close grip Move the claw upwards until the top of the banana is at Y=7, high enough to clear the book:
(1,4) Moving up to clear the book (1,5) Moving up to clear the book (1,6) Moving up to clear the book (1,7) Moving up to clear the book Move the claw rightwards until the left edge of the claw is at X=18, to position the banana over the book:
(2,7) Moving right to position the banana over the book (3,7) Moving right to position the banana over the book (4,7) Moving right to position the banana over the book (5,7) Moving right to position the banana over the book (6,7) Moving right to position the banana over the book (7,7) Moving right to position the banana over the book (8,7) Moving right to position the banana over the book (9,7) Moving right to position the banana over the book (10,7) Moving right to position the banana over the book (11,7) Moving right to position the banana over the book (12,7) Moving right to position the banana over the book (13,7) Moving right to position the banana over the book (14,7) Moving right to position the banana over the book (15,7) Moving right to position the banana over the book (16,7) Moving right to position the banana over the book (17,7) Moving right to position the banana over the book (18,7) Moving right to position the banana over the book Move the claw downwards until the top of the banana is at Y=3, above the book and aligned with the top edge of the book:
(18,6) Moving down to position the banana on the book (18,5) Moving down to position the banana on the book (18,4) Moving down to position the banana on the book (18,3) Moving down to position the banana on the book Release the grip to place the banana on top of the book:
Open grip And that’s it! Of course, this is just one possible plan, and there may be other ways to approach the problem. However, this plan should work if the claw is able to grip the banana without dropping it.
When I instead asked it to write a plan instead of every single movement, it sometimes made correct plans for what positions to move to, but it never led to completely correct sequences.
It looks like ChatGPT got the micro-pattern of “move one space at a time” correct. But it got confused between “on top of” the book versus “to the right of” the book, and also missed what type of overlap it needs to grab the banana.
Were all the other attempts the same kind of thing?
I would also be curious to see how uPaLM or GPT-4 does with that example.
Yes, all other attempts with ChatGPT were similar.
GPT-4 got it almost correct on the first attempt
(5,5) Initial position. (4,5) Moving left to get closer to the banana. (4,4) Moving down to align with the banana’s top edge. (4,3) Moving down to be within the required overlapping area of the banana. Close grip. Gripping the banana.
(4,4) Lifting the banana upwards. (5,4) Moving right to clear the initial banana position. (6,4) Continuing to move right towards the book. (7,4) Moving further right to avoid collision with the book’s edges. (8,4) Positioning the banana above the book. (9,4) Moving right to ensure proper alignment with the book. (10,4) Positioning the banana above the center of the book. (10,3) Lowering the banana onto the book. Open grip. Releasing the banana.
(10,4) Moving the claw up after releasing the banana.
But didn’t have 5 cm overlap with the banana, and actually the claw also collides with the book (which is a trickier problem).
I pointed out the first error:
Does the grip have 3 cm overlapping areas with the banana when the grip is closed in your suggested solution?
And it corrected itself about the banana but still collided with the book.
I don’t think I can tell from this how (or whether) GPT-4 is representing anything like a visual graphic of the task.
It is also not clear to me if GPT-4′s performance and tendency to collide with the book is affected by the banana and book overlapping slightly in their starting positions. (I suspect that changing the starting positions to where this is no longer true would not have a noticeable effect on GPT-4′s performance, but I am not very confident in that suspicion.)
Oh, I didnt actually notice that the banana overlaps with the book at the start, I tried changing that but still gpt-4 makes them collide:
(5,5) Initial position of the claw. (4,5) Moving left to get closer to the banana. (4,4) Moving down to align with the banana’s Y coordinate. (4,3) Moving down to ensure a good grip on the banana. Close grip # Gripping the banana with at least 3 cm of overlapping area on the Y axis. (5,3) Moving right to avoid any collision with the banana’s edge. (6,3) Moving right to clear the edge of the banana. (7,3) Moving right to position the banana above the book. (8,3) Moving right to align the banana with the left edge of the book. (9,3) Moving right to center the banana over the book. (10,3) Moving right to ensure the banana is on the book. Open grip # Releasing the banana onto the book.
At this point, the robotic arm has successfully placed the banana on top of the book.
Interesting test!
I wrote a simplified test based on this and gave it to ChatGPT, and despite me trying various prompts, it never got a correct solution, although it did come close several times.
I think uPaLM would have been able to figure out my test though.
Here is the prompt I wrote:
And here is the output:
When I instead asked it to write a plan instead of every single movement, it sometimes made correct plans for what positions to move to, but it never led to completely correct sequences.
Should that be “at least 1 cm”?
Yes, you’re correct.
It looks like ChatGPT got the micro-pattern of “move one space at a time” correct. But it got confused between “on top of” the book versus “to the right of” the book, and also missed what type of overlap it needs to grab the banana.
Were all the other attempts the same kind of thing?
I would also be curious to see how uPaLM or GPT-4 does with that example.
Yes, all other attempts with ChatGPT were similar.
GPT-4 got it almost correct on the first attempt
But didn’t have 5 cm overlap with the banana, and actually the claw also collides with the book (which is a trickier problem).
I pointed out the first error:
Does the grip have 3 cm overlapping areas with the banana when the grip is closed in your suggested solution?
And it corrected itself about the banana but still collided with the book.
Interesting.
I don’t think I can tell from this how (or whether) GPT-4 is representing anything like a visual graphic of the task.
It is also not clear to me if GPT-4′s performance and tendency to collide with the book is affected by the banana and book overlapping slightly in their starting positions. (I suspect that changing the starting positions to where this is no longer true would not have a noticeable effect on GPT-4′s performance, but I am not very confident in that suspicion.)
Oh, I didnt actually notice that the banana overlaps with the book at the start, I tried changing that but still gpt-4 makes them collide: