My experience over the past few years has been one of being surprised by latent capacities in existing models. A lot of stuff like prompt engineering, fine tuning, chain of thought, Open-AI-style “alignment” can be seen as not so much creating new capacities as revealing/refining latent ones. Back when GPT-3 was new, Connor Leahy said something like “GPT-3 is already general intelligence” which sounded like hyperbole to me at the time, and seems less so now.
Though RSI still seems very plausible to me, one scenario I’ve started thinking about is a massive effective capabilities gain caused not by RSI or any non-trivial algorithmic improvement, but just the dissolution of a much larger than anticipated “latent capacities overhang”.
Possibly an absurd and confused scenario, but is it that implausible that some day we will get a model that still seems kinda dumb but is in fact one prompt away from super-criticality?
You don’t need to change anything in the underlying machine learning algorithms to make a model like ChatGPT generate new training data that could be used for recursive self-improvement.
Especially, if you give it access to a console so that it can reliably run code, it could create its own training data and get into recursive self-improvement.
If you for example want it to learn to reliably multiply two 4-digit numbers you can randomly generate 4-digit numbers. Then you let it generate a text answer with individual steps. You let a second model create python code to validate all the individual calculations in the individual steps. If the python code validates that all the calculations are correct, you can have a new piece of training data on how to multiply two 4-digit numbers.
Based on ChatGPT user data it might be possible to create an automated system that finds problems where ChatGPT currently most of the time gives a wrong answer and figure out how to create code that analyses newly created examples to see whether they are correct.
I’ll just note here that “ability to automate the validation” is only possible when we already know the answer. Since the automated loom, computers have been a device for doing the same thing, over and over, very fast.
You don’t necessarily need to know the correct answer beforehand to be able to validate whether or not an answer is correct. If we take Eliezer’s problem of generating text that matches a given hash value, it’s easy to validate whether an answer is true or not even if you don’t know the answer beforehand.
What’s important is that the AI is sometimes able to generate correct answers. If the criteria for a correct answer are well-defined enough it can go from solving a problem 1% of the time correctly to solving it 100% of the time correctly.
ChatGPT is used by millions of people and a good portion of that will click the feedback button, especially if they optimize their UI for that. It’s possible to build automated processes that will look at the problems where it currently frequently makes mistakes and learn to avoid them. It is possible to build a self-improving system around that.
If you let it do that for 10,000 different problems I would expect that it learns some reasoning habits that generalize and are useful for solving other problems as well.
My experience over the past few years has been one of being surprised by latent capacities in existing models. A lot of stuff like prompt engineering, fine tuning, chain of thought, Open-AI-style “alignment” can be seen as not so much creating new capacities as revealing/refining latent ones. Back when GPT-3 was new, Connor Leahy said something like “GPT-3 is already general intelligence” which sounded like hyperbole to me at the time, and seems less so now.
Though RSI still seems very plausible to me, one scenario I’ve started thinking about is a massive effective capabilities gain caused not by RSI or any non-trivial algorithmic improvement, but just the dissolution of a much larger than anticipated “latent capacities overhang”.
Possibly an absurd and confused scenario, but is it that implausible that some day we will get a model that still seems kinda dumb but is in fact one prompt away from super-criticality?
You don’t need to change anything in the underlying machine learning algorithms to make a model like ChatGPT generate new training data that could be used for recursive self-improvement.
Especially, if you give it access to a console so that it can reliably run code, it could create its own training data and get into recursive self-improvement.
If you for example want it to learn to reliably multiply two 4-digit numbers you can randomly generate 4-digit numbers. Then you let it generate a text answer with individual steps. You let a second model create python code to validate all the individual calculations in the individual steps. If the python code validates that all the calculations are correct, you can have a new piece of training data on how to multiply two 4-digit numbers.
Based on ChatGPT user data it might be possible to create an automated system that finds problems where ChatGPT currently most of the time gives a wrong answer and figure out how to create code that analyses newly created examples to see whether they are correct.
I’ll just note here that “ability to automate the validation” is only possible when we already know the answer. Since the automated loom, computers have been a device for doing the same thing, over and over, very fast.
You don’t necessarily need to know the correct answer beforehand to be able to validate whether or not an answer is correct. If we take Eliezer’s problem of generating text that matches a given hash value, it’s easy to validate whether an answer is true or not even if you don’t know the answer beforehand.
What’s important is that the AI is sometimes able to generate correct answers. If the criteria for a correct answer are well-defined enough it can go from solving a problem 1% of the time correctly to solving it 100% of the time correctly.
ChatGPT is used by millions of people and a good portion of that will click the feedback button, especially if they optimize their UI for that. It’s possible to build automated processes that will look at the problems where it currently frequently makes mistakes and learn to avoid them. It is possible to build a self-improving system around that.
If you let it do that for 10,000 different problems I would expect that it learns some reasoning habits that generalize and are useful for solving other problems as well.