I’d guess that xAI, Anthropic, and GDM are more like 5-20% faster all around (with much greater acceleration on some subtasks). It seems plausible to me that the acceleration at OpenAI is already much greater than this (e.g. more like 1.5x or 2x), or will be after some adaptation due to OpenAI having substantially better internal agents than what they’ve released. (I think this due to updates from o3 and general vibes.)
I was saying 2x because I’ve memorised the results from this study. Do we have better numbers today? R&D is harder, so this is an upper bound. However, since this was from one year ago, so perhaps the factors cancel each other out?
This case seems extremely cherry picked for cases where uplift is especially high. (Note that this is in copilot’s interest.) Now, this task could probably be solved autonomously by an AI in like 10 minutes with good scaffolding.
I think you have to consider the full diverse range of tasks to get a reasonable sense or at least consider harder tasks. Like RE-bench seems much closer, but I still expect uplift on RE-bench to probably (but not certainly!) considerably overstate real world speed up.
Yeah, fair enough. I think someone should try to do a more representative experiment and we could then monitor this metric.
btw, something that bothers me a little bit with this metric is the fact that a very simple AI that just asks me periodically “Hey, do you endorse what you are doing right now? Are you time boxing? Are you following your plan?” makes me (I think) significantly more strategic and productive. Similar to I hired 5 people to sit behind me and make me productive for a month. But this is maybe off topic.
btw, something that bothers me a little bit with this metric is the fact that a very simple AI …
Yes, but I don’t see a clear reason why people (working in AI R&D) will in practice get this productivity boost (or other very low hanging things) if they don’t get around to getting the boost from hiring humans.
How much faster do you think we are already? I would say 2x.
I’d guess that xAI, Anthropic, and GDM are more like 5-20% faster all around (with much greater acceleration on some subtasks). It seems plausible to me that the acceleration at OpenAI is already much greater than this (e.g. more like 1.5x or 2x), or will be after some adaptation due to OpenAI having substantially better internal agents than what they’ve released. (I think this due to updates from o3 and general vibes.)
I was saying 2x because I’ve memorised the results from this study. Do we have better numbers today? R&D is harder, so this is an upper bound. However, since this was from one year ago, so perhaps the factors cancel each other out?
This case seems extremely cherry picked for cases where uplift is especially high. (Note that this is in copilot’s interest.) Now, this task could probably be solved autonomously by an AI in like 10 minutes with good scaffolding.
I think you have to consider the full diverse range of tasks to get a reasonable sense or at least consider harder tasks. Like RE-bench seems much closer, but I still expect uplift on RE-bench to probably (but not certainly!) considerably overstate real world speed up.
Yeah, fair enough. I think someone should try to do a more representative experiment and we could then monitor this metric.
btw, something that bothers me a little bit with this metric is the fact that a very simple AI that just asks me periodically “Hey, do you endorse what you are doing right now? Are you time boxing? Are you following your plan?” makes me (I think) significantly more strategic and productive. Similar to I hired 5 people to sit behind me and make me productive for a month. But this is maybe off topic.
Yes, but I don’t see a clear reason why people (working in AI R&D) will in practice get this productivity boost (or other very low hanging things) if they don’t get around to getting the boost from hiring humans.