I saw someone use OpenAI’s new Operator model today. It couldn’t order a pizza by itself. Why is AI in the bottom percentile of humans at using a computer, and top percentile at solving maths problems? I don’t think maths problems are shorter horizon than ordering a pizza, nor easier to verify.
Your answer was helpful but I’m still very confused by what I’m seeing.
I think it’s much easier to RL on huge numbers of math problems, including because it is easier to verify and because you can more easily get many problems. Also, for random reasons, doing single turn RL is substantially less complex and maybe faster than multi turn RL on agency (due to variable number of steps and variable delay from environments)
OpenAI probably hasn’t gotten around to doing as much computer use RL partially due to prioritization.
I saw someone use OpenAI’s new Operator model today. It couldn’t order a pizza by itself. Why is AI in the bottom percentile of humans at using a computer, and top percentile at solving maths problems? I don’t think maths problems are shorter horizon than ordering a pizza, nor easier to verify.
Your answer was helpful but I’m still very confused by what I’m seeing.
I think it’s much easier to RL on huge numbers of math problems, including because it is easier to verify and because you can more easily get many problems. Also, for random reasons, doing single turn RL is substantially less complex and maybe faster than multi turn RL on agency (due to variable number of steps and variable delay from environments)
OpenAI probably hasn’t gotten around to doing as much computer use RL partially due to prioritization.