One of the obvious ways to improve chatGPT is to have it solve coding problems, math problems, and other problems, where the problems are reusing existing frameworks. “go and solve this practice GRE, solve all of the problems on leetcode and codesignal”.
It would get feedback scores based on how correct it’s solution was to each problem.
To make a better coder specifically you could add tools that adversarially try to find input to trigger a security bug in the generated code, and have other tasks like “translate this program to these other languages”. A translation is very easy to validate for correctness.
Presumably this would use the same API used for RLHF. So it’s a more conventional RL based AI that is built with GPT as the starting point.
This would, I assume, trivially be superhuman at the tasks that can be refined this way.
One of the obvious ways to improve chatGPT is to have it solve coding problems, math problems, and other problems, where the problems are reusing existing frameworks. “go and solve this practice GRE, solve all of the problems on leetcode and codesignal”.
It would get feedback scores based on how correct it’s solution was to each problem.
To make a better coder specifically you could add tools that adversarially try to find input to trigger a security bug in the generated code, and have other tasks like “translate this program to these other languages”. A translation is very easy to validate for correctness.
Presumably this would use the same API used for RLHF. So it’s a more conventional RL based AI that is built with GPT as the starting point.
This would, I assume, trivially be superhuman at the tasks that can be refined this way.
Do you mean something like this? https://arxiv.org/abs/2207.14502