I’m not sure if this has been made public, but I would be surprised if this was achieved by (substantial) retraining of the underlying foundation model. My guess is that this was achieved mainly by various filters put on top. But it is possible that fine tuning was used. Regardless, catastrophic forgetting remains a fundamental issue. There are various benchmarks you can take a look at, if you want.
The benchmarks tell you about what the existing systems do. They don’t tell you about what’s possible.
One of OpenAI’s current projects is to figure out how to extract from the conversations that ChatGPT has valuable data for fine-tuning.
There’s no fundamental reason why it can’t extract from the conversation it has all the relevant information and do fine-tuning to add it to its long-term memory.
When it comes to ToS violations it seems evident that such a system is working, based on my interactions with it. ChatGPT has basically three ways to answer with normal text, with red text, and with custom answers which explain to you why it won’t answer your query.
Both the red text answers and the custom answers increased over a variety of different prompts. When it does its red text answers there’s a feedback button to tell them if you think it made a mistake.
To me, it seems obvious that those red-text answers get used as training material for fine-tuning and that this helps with detecting similar cases in the future.
I would consider InstructGPT, ChatGPT, GATO, and similar systems, to all be in the general reference class of systems that are “mostly big transformers, trained in a self-supervised way, with some comparably minor things added on top”.
You could summarize InstructGPT’s lesson as “You can get huge capability gains by comparably minor things added on top”.
You can talk about how they are minor at a technical level but that doesn’t change the fact that these minor things produce huge capability gains.
In the future, there’s also a lot of additional room to get more clever about providing training data.
The benchmarks tell you about what the existing systems do. They don’t tell you about what’s possible.
Of course. It is almost certainly possible to solve the problem of catastrophic forgetting, and the solution might not be that complicated either. My point is that it is a fairly significant problem that has not yet been solved, and that solving it probably requires some insight or idea that does not yet exist. You can achieve some degree of lifelong learning through regularised fine-tuning, but you cannot get anywhere near what would be required for human-level cognition.
You could summarize InstructGPT’s lesson as “You can get huge capability gains by comparably minor things added on top”.
Yes, I think that lesson has been proven quite conclusively now. I also found systems like PaLM-SayCan very convincing for this point. But the question is not whether or not you can get huge capability gains—this is evidently true—the question is whether you get close to AGI without new theoretical breakthroughts. I want to know if we are now on (and close to) the end of the critical path, or whether we should expect unforeseeable breakthroughts to throw us off course a few more times before then.
The benchmarks tell you about what the existing systems do. They don’t tell you about what’s possible.
One of OpenAI’s current projects is to figure out how to extract from the conversations that ChatGPT has valuable data for fine-tuning.
There’s no fundamental reason why it can’t extract from the conversation it has all the relevant information and do fine-tuning to add it to its long-term memory.
When it comes to ToS violations it seems evident that such a system is working, based on my interactions with it. ChatGPT has basically three ways to answer with normal text, with red text, and with custom answers which explain to you why it won’t answer your query.
Both the red text answers and the custom answers increased over a variety of different prompts. When it does its red text answers there’s a feedback button to tell them if you think it made a mistake.
To me, it seems obvious that those red-text answers get used as training material for fine-tuning and that this helps with detecting similar cases in the future.
You could summarize InstructGPT’s lesson as “You can get huge capability gains by comparably minor things added on top”.
You can talk about how they are minor at a technical level but that doesn’t change the fact that these minor things produce huge capability gains.
In the future, there’s also a lot of additional room to get more clever about providing training data.
Of course. It is almost certainly possible to solve the problem of catastrophic forgetting, and the solution might not be that complicated either. My point is that it is a fairly significant problem that has not yet been solved, and that solving it probably requires some insight or idea that does not yet exist. You can achieve some degree of lifelong learning through regularised fine-tuning, but you cannot get anywhere near what would be required for human-level cognition.
Yes, I think that lesson has been proven quite conclusively now. I also found systems like PaLM-SayCan very convincing for this point. But the question is not whether or not you can get huge capability gains—this is evidently true—the question is whether you get close to AGI without new theoretical breakthroughts. I want to know if we are now on (and close to) the end of the critical path, or whether we should expect unforeseeable breakthroughts to throw us off course a few more times before then.