What if you had assigned less than 0.01% to “RSI is so trivial that the first kludged loop to GPT-4 by an external user without access to the code or weights would successfully self-improve”?
I would think you were massively overconfident in that. I don’t think you could make 10,000 predictions like that and only be wrong once (for a sense of intuition, that’s like making one prediction per hour, 8 hours per day, 5 days a week for 5 years, and being wrong once).
Unless you mean “recursively self-improve all the way to godhood” instead of “recursively self-improve to the point where it would discover things as hard as the first improvement it found in like 10% as much time as it took originally”.
For reference for why I did give at least 10% to “the dumbest possible approach will work to get meaningful improvement”—humans spent many thousands of years not developing much technology at all, and then, a few thousand years ago, suddenly started doing agriculture and building cities and inventing tools. The difference between “humans do agriculture” and “humans who don’t” isn’t pure genetics—humans came to the Americas over 20,000 years ago, agriculture has only been around for about 10,000 of those 20,000 years, and yet there were fairly advanced agricultural civilizations in the Americas thousands of years ago. Which says to me that, for humans at least, most of our ability to do impressive things comes from our ability to accumulate a bunch of tricks that work over time, and communicate those tricks to others.
So if it turned out that “the core of effectiveness for a language model is to make a dumb wrapper script and the ability to invoke copies of itself with a different wrapper script, that’s enough for it to close the gap between the capabilities of the base language model and the capabilities of something as smart as the base language model but as coherent as a human”, I would have been slightly surprised, but not surprised enough that I could have made 10 predictions like that and only been wrong about one of them. Certainly not 100 or 10,000 predictions like that.
Edit: Keep in mind that the dumbest possible approach of “define a JSON file that describes the tool and ensure that that JSON file has a link to detailed API docsdoes work for teaching GPT-4 how to use tools.
My estimate is based on the structure of the problem and the entity trying to solve it. I’m not treating it as some black-box instance of “the dumbest thing can work”. I agree that the latter types of problem should be assigned more than 0.01%.
I already knew quite a lot about GPT-4′s strengths and weaknesses, and about the problem domain it needs to operate in for self-improvement to take place. If I were a completely uneducated layman from 1900 (or even from 2000, probably) then a probability of 10% or more might be reasonable.
I would think you were massively overconfident in that. I don’t think you could make 10,000 predictions like that and only be wrong once (for a sense of intuition, that’s like making one prediction per hour, 8 hours per day, 5 days a week for 5 years, and being wrong once).
Unless you mean “recursively self-improve all the way to godhood” instead of “recursively self-improve to the point where it would discover things as hard as the first improvement it found in like 10% as much time as it took originally”.
For reference for why I did give at least 10% to “the dumbest possible approach will work to get meaningful improvement”—humans spent many thousands of years not developing much technology at all, and then, a few thousand years ago, suddenly started doing agriculture and building cities and inventing tools. The difference between “humans do agriculture” and “humans who don’t” isn’t pure genetics—humans came to the Americas over 20,000 years ago, agriculture has only been around for about 10,000 of those 20,000 years, and yet there were fairly advanced agricultural civilizations in the Americas thousands of years ago. Which says to me that, for humans at least, most of our ability to do impressive things comes from our ability to accumulate a bunch of tricks that work over time, and communicate those tricks to others.
So if it turned out that “the core of effectiveness for a language model is to make a dumb wrapper script and the ability to invoke copies of itself with a different wrapper script, that’s enough for it to close the gap between the capabilities of the base language model and the capabilities of something as smart as the base language model but as coherent as a human”, I would have been slightly surprised, but not surprised enough that I could have made 10 predictions like that and only been wrong about one of them. Certainly not 100 or 10,000 predictions like that.
Edit: Keep in mind that the dumbest possible approach of “define a JSON file that describes the tool and ensure that that JSON file has a link to detailed API docs does work for teaching GPT-4 how to use tools.
My estimate is based on the structure of the problem and the entity trying to solve it. I’m not treating it as some black-box instance of “the dumbest thing can work”. I agree that the latter types of problem should be assigned more than 0.01%.
I already knew quite a lot about GPT-4′s strengths and weaknesses, and about the problem domain it needs to operate in for self-improvement to take place. If I were a completely uneducated layman from 1900 (or even from 2000, probably) then a probability of 10% or more might be reasonable.