Michaël Trazzi comments on What will GPT-4 be incapable of?

Michaël Trazzi 7 Apr 2021 7:07 UTC
1 point
So physics understanding.

How do you think it would perform on simpler question closer to its training dataset, like “we throw a ball from a 500m building with no wind, and the same ball but with wind, which one hits the floor earlier” (on average, after 1000 questions).$? If this still does not seem plausible, what is something you would bet $100 2:1 but not 1:1 that it would not be able to do?
- Ericf 7 Apr 2021 14:16 UTC
  1 point
  Parent
  What do you mean by “on average after 1000 questions”? Because that is the crux of my answer: GPT-4 won’t be able to QA its own work for accuracy, or even relevance.
  - Michaël Trazzi 7 Apr 2021 14:31 UTC
    1 point
    Parent
    well if we’re doing a bet then at some point we need to “resolve” the prediction. so we ask GPT-4 the same physics question 1000 times and then some humans judges count how many it got right, if it gets it right more than let’s say 95% of the time (or any confidence interval) , then we would resolve this positively. of course you could do more than 1000, and with law of large numbers it should converge to the true probability of giving the right answer?
    - Ericf 7 Apr 2021 17:50 UTC
      1 point
      Parent
      That wouldn’t be useful, though.
      
      My assertion is more like: After getting the content of elementary school science textbooks (or high school physics, or whatever other school science content makes sense), but not including the end-of-chapter questions (and especially not the answers), GPT-4 will be unable to provide the correct answer to more then 50% of the questions from the end of the chapters, constrained by having to take the first response that looks like a solution as it’s “answer” and not throwing away more than 3 obviously gibberish or bullshit responses per question.
      
      And that 50% number is based on giving it every question without discrimination. If we only count the synthesis questions (as opposed to the memory/definition questions), I predict 1%, but would bet on < 10%
      - Michaël Trazzi 7 Apr 2021 18:40 UTC
        1 point
        Parent
        let’s say by concatenating your textbooks you get plenty of examples of $f = m \cdot a$ with “blablabla object sky blablabla gravity $a = 9.8 m / s^{2}$ blablabla $m = 12 k g$ blabla $f = 12 * 9.8 = 120 N$ . And then the exercise is: “blablabla object of mass blablabla thrown from the sky, what’s the force? a) f=120 b) … c) … d) …”. then what you need to do is just do some prompt programming at the beginning by “for looping answer” and teaching it to return either a,b,c or d. Now, I don’t see any reason why a neural net couldn’t approximate linear functions of two variables. It just needs to map words like “derivative of speed”, “acceleration”, “ $d^{2} z / d t^{2}$ ” to the same concept and then look at it with attention & multiply two digits.
        Ericf 7 Apr 2021 20:32 UTC
        1 point
        Parent
        Generally the answers aren’t multiple choice. Here’s a couple examples of questions from a 5th grade science textbook I found on Google:
        
        How would you state your address in space. Explain your answer.
        
        Would you weigh the same on the sun as you do on Earth. Explain your answer.
        
        Why is it so difficult to design a real-scale model of the solar system?
        
        Michaël Trazzi 7 Apr 2021 20:56 UTC
        1 point
        Parent
        If it’s about explaining your answer with 5th grade gibberish then GPT-4 is THE solution for you! ;)