My assertion is more like:
After getting the content of elementary school science textbooks (or high school physics, or whatever other school science content makes sense), but not including the end-of-chapter questions (and especially not the answers), GPT-4 will be unable to provide the correct answer to more then 50% of the questions from the end of the chapters, constrained by having to take the first response that looks like a solution as it’s “answer” and not throwing away more than 3 obviously gibberish or bullshit responses per question.
And that 50% number is based on giving it every question without discrimination. If we only count the synthesis questions (as opposed to the memory/definition questions), I predict 1%, but would bet on < 10%
let’s say by concatenating your textbooks you get plenty of examples of f=m⋅a with “blablabla object sky blablabla gravity a=9.8m/s2 blablabla m=12kg blabla f=12∗9.8=120N. And then the exercise is: “blablabla object of mass blablabla thrown from the sky, what’s the force? a) f=120 b) … c) … d) …”. then what you need to do is just do some prompt programming at the beginning by “for looping answer” and teaching it to return either a,b,c or d. Now, I don’t see any reason why a neural net couldn’t approximate linear functions of two variables. It just needs to map words like “derivative of speed”, “acceleration”, “d2z/dt2” to the same concept and then look at it with attention & multiply two digits.
That wouldn’t be useful, though.
My assertion is more like: After getting the content of elementary school science textbooks (or high school physics, or whatever other school science content makes sense), but not including the end-of-chapter questions (and especially not the answers), GPT-4 will be unable to provide the correct answer to more then 50% of the questions from the end of the chapters, constrained by having to take the first response that looks like a solution as it’s “answer” and not throwing away more than 3 obviously gibberish or bullshit responses per question.
And that 50% number is based on giving it every question without discrimination. If we only count the synthesis questions (as opposed to the memory/definition questions), I predict 1%, but would bet on < 10%
let’s say by concatenating your textbooks you get plenty of examples of f=m⋅a with “blablabla object sky blablabla gravity a=9.8m/s2 blablabla m=12kg blabla f=12∗9.8=120N. And then the exercise is: “blablabla object of mass blablabla thrown from the sky, what’s the force? a) f=120 b) … c) … d) …”. then what you need to do is just do some prompt programming at the beginning by “for looping answer” and teaching it to return either a,b,c or d. Now, I don’t see any reason why a neural net couldn’t approximate linear functions of two variables. It just needs to map words like “derivative of speed”, “acceleration”, “d2z/dt2” to the same concept and then look at it with attention & multiply two digits.
Generally the answers aren’t multiple choice. Here’s a couple examples of questions from a 5th grade science textbook I found on Google:
How would you state your address in space. Explain your answer.
Would you weigh the same on the sun as you do on Earth. Explain your answer.
Why is it so difficult to design a real-scale model of the solar system?
If it’s about explaining your answer with 5th grade gibberish then GPT-4 is THE solution for you! ;)