I don’t think I can tell from this how (or whether) GPT-4 is representing anything like a visual graphic of the task.
It is also not clear to me if GPT-4′s performance and tendency to collide with the book is affected by the banana and book overlapping slightly in their starting positions. (I suspect that changing the starting positions to where this is no longer true would not have a noticeable effect on GPT-4′s performance, but I am not very confident in that suspicion.)
Oh, I didnt actually notice that the banana overlaps with the book at the start, I tried changing that but still gpt-4 makes them collide:
(5,5) Initial position of the claw. (4,5) Moving left to get closer to the banana. (4,4) Moving down to align with the banana’s Y coordinate. (4,3) Moving down to ensure a good grip on the banana. Close grip # Gripping the banana with at least 3 cm of overlapping area on the Y axis. (5,3) Moving right to avoid any collision with the banana’s edge. (6,3) Moving right to clear the edge of the banana. (7,3) Moving right to position the banana above the book. (8,3) Moving right to align the banana with the left edge of the book. (9,3) Moving right to center the banana over the book. (10,3) Moving right to ensure the banana is on the book. Open grip # Releasing the banana onto the book.
At this point, the robotic arm has successfully placed the banana on top of the book.
Interesting.
I don’t think I can tell from this how (or whether) GPT-4 is representing anything like a visual graphic of the task.
It is also not clear to me if GPT-4′s performance and tendency to collide with the book is affected by the banana and book overlapping slightly in their starting positions. (I suspect that changing the starting positions to where this is no longer true would not have a noticeable effect on GPT-4′s performance, but I am not very confident in that suspicion.)
Oh, I didnt actually notice that the banana overlaps with the book at the start, I tried changing that but still gpt-4 makes them collide: