GPT-4′s release was delayed by ~8 months because they wanted to do safety testing before releasing it. If you take this into account your graph looks much less steep.
The employees at OpenAI know about prediction markets.
They also have incentives to manipulate them to look like GPT-5 will come out later than it actually will. They don’t want to set off an AI arms race.
“GPT-4′s release was delayed by ~8 months because they wanted to do safety testing”
I have heard this claim before (with 6 months). This could be understood as “GPT-4 was ready to go 6 month earlier, they simply did a lot of testing to go the extra mile.”
Alternatively this is how long it took to make the foundational model useful, and while they did spend extra resources for red teaming etc. in parallel, this didn’t come with a great cost of releasing it later.
Are we sure they didn’t just count the time to RLHF it? Seems plausible to me that it always takes ~ 20% of dev time to RLHF a model. (epistemic status: spitballing)
Some months before release they had a RLHF-ed model, where the RLHF was significantly worse on most dimensions than the model they finally released. This early RLHF-ed model was mentioned in eg Sparks of AGI.
A few things to note:
GPT-4′s release was delayed by ~8 months because they wanted to do safety testing before releasing it. If you take this into account your graph looks much less steep.
The employees at OpenAI know about prediction markets.
They also have incentives to manipulate them to look like GPT-5 will come out later than it actually will. They don’t want to set off an AI arms race.
“GPT-4′s release was delayed by ~8 months because they wanted to do safety testing”
I have heard this claim before (with 6 months). This could be understood as “GPT-4 was ready to go 6 month earlier, they simply did a lot of testing to go the extra mile.”
Alternatively this is how long it took to make the foundational model useful, and while they did spend extra resources for red teaming etc. in parallel, this didn’t come with a great cost of releasing it later.
Are we sure they didn’t just count the time to RLHF it? Seems plausible to me that it always takes ~ 20% of dev time to RLHF a model. (epistemic status: spitballing)
Some months before release they had a RLHF-ed model, where the RLHF was significantly worse on most dimensions than the model they finally released. This early RLHF-ed model was mentioned in eg Sparks of AGI.