Bruce G comments on A Proposed Test to Determine the Extent to Which Large Language Models Understand the Real World

Bruce G 20 Mar 2023 1:08 UTC
1 point
0
It looks like ChatGPT got the micro-pattern of “move one space at a time” correct. But it got confused between “on top of” the book versus “to the right of” the book, and also missed what type of overlap it needs to grab the banana.
Were all the other attempts the same kind of thing?
I would also be curious to see how uPaLM or GPT-4 does with that example.
- MrThink 20 Mar 2023 11:29 UTC
  2 points
  0
  Parent
  Yes, all other attempts with ChatGPT were similar.
  GPT-4 got it almost correct on the first attempt
  (5,5) Initial position. (4,5) Moving left to get closer to the banana. (4,4) Moving down to align with the banana’s top edge. (4,3) Moving down to be within the required overlapping area of the banana. Close grip. Gripping the banana.
  (4,4) Lifting the banana upwards. (5,4) Moving right to clear the initial banana position. (6,4) Continuing to move right towards the book. (7,4) Moving further right to avoid collision with the book’s edges. (8,4) Positioning the banana above the book. (9,4) Moving right to ensure proper alignment with the book. (10,4) Positioning the banana above the center of the book. (10,3) Lowering the banana onto the book. Open grip. Releasing the banana.
  (10,4) Moving the claw up after releasing the banana.
  But didn’t have 5 cm overlap with the banana, and actually the claw also collides with the book (which is a trickier problem).
  
  I pointed out the first error:
  
  Does the grip have 3 cm overlapping areas with the banana when the grip is closed in your suggested solution?
  And it corrected itself about the banana but still collided with the book.
  - Bruce G 21 Mar 2023 5:05 UTC
    1 point
    0
    Parent
    Interesting.
    
    I don’t think I can tell from this how (or whether) GPT-4 is representing anything like a visual graphic of the task.
    
    It is also not clear to me if GPT-4′s performance and tendency to collide with the book is affected by the banana and book overlapping slightly in their starting positions. (I suspect that changing the starting positions to where this is no longer true would not have a noticeable effect on GPT-4′s performance, but I am not very confident in that suspicion.)
    - MrThink 21 Mar 2023 8:44 UTC
      2 points
      0
      Parent
      Oh, I didnt actually notice that the banana overlaps with the book at the start, I tried changing that but still gpt-4 makes them collide:
      (5,5) Initial position of the claw. (4,5) Moving left to get closer to the banana. (4,4) Moving down to align with the banana’s Y coordinate. (4,3) Moving down to ensure a good grip on the banana. Close grip # Gripping the banana with at least 3 cm of overlapping area on the Y axis. (5,3) Moving right to avoid any collision with the banana’s edge. (6,3) Moving right to clear the edge of the banana. (7,3) Moving right to position the banana above the book. (8,3) Moving right to align the banana with the left edge of the book. (9,3) Moving right to center the banana over the book. (10,3) Moving right to ensure the banana is on the book. Open grip # Releasing the banana onto the book.
      At this point, the robotic arm has successfully placed the banana on top of the book.