2.12 Acceleration OpenAI has been concerned with how development and deployment of state-of-the-art systems like GPT-4 could affect the broader AI research and development ecosystem.23 One concern of particular importance to OpenAI is the risk of racing dynamics leading to a decline in safety standards, the diffusion of bad norms, and accelerated AI timelines, each of which heighten societal risks associated with AI. We refer to these here as acceleration risk.”24 This was one of the reasons we spent eight months on safety research, risk assessment, and iteration prior to launching GPT-4. In order to specifically better understand acceleration risk from the deployment of GPT-4, we recruited expert forecasters25 to predict how tweaking various features of the GPT-4 deployment (e.g., timing, communication strategy, and method of commercialization) might affect (concrete indicators of) acceleration risk. Forecasters predicted several things would reduce acceleration, including delaying deployment of GPT-4 by a further six months and taking a quieter communications strategy around the GPT-4 deployment (as compared to the GPT-3 deployment). We also learned from recent deployments that the effectiveness of quiet communications strategy in mitigating acceleration risk can be limited, in particular when novel accessible capabilities are concerned.
We also conducted an evaluation to measure GPT-4’s impact on international stability and to identify the structural factors that intensify AI acceleration. We found that GPT-4’s international impact is most likely to materialize through an increase in demand for competitor products in other countries. Our analysis identified a lengthy list of structural factors that can be accelerants, including government innovation policies, informal state alliances, tacit knowledge transfer between scientists, and existing formal export control agreements.
Our approach to forecasting acceleration is still experimental and we are working on researching and developing more reliable acceleration estimates.
My analysis:
They’re very aware of arms races conceptually, and say they dislike arms races for all the right reasons (“One concern of particular importance to OpenAI is the risk of racing dynamics leading to a decline in safety standards, the diffusion of bad norms, and accelerated AI timelines, each of which heighten societal risks associated with AI.”)
They considered two mitigations to race dynamics with respect to releasing GPT-4:
“Quiet communications”, which they didn’t pursue because that didn’t work for ChatGPT (“We also learned from recent deployments that the effectiveness of quiet communications strategy in mitigating acceleration risk can be limited, in particular when novel accessible capabilities are concerned.”)
“Delaying deployment of GPT-4 by a further six months” which they didn’t pursue because ???? [edit: I mean to say they don’t explain why this option wasn’t chosen, unlike the justification given for not pursuing the “quiet communications” strategy. If I had to guess it was reasoning like “well we already waited 8 months, waiting another 6 offers a small benefit, but the marginal returns to delaying are small.”]
There’s a very obvious gap here between what they are saying they are concerned about in terms of accelerating potentially-dangerous AI capabilities, and what they are actually doing.
Since it [GPT-4] finished training in August of 2022, we have been evaluating, adversarially testing, and iteratively improving the model and the system-level mitigations around it.
(Emphasis added.) This coincides with the “eight months” of safety research they mention. I wasn’t aware of this when I made my original post so I’ll edit it to be fairer.
But this itself is surprising: GPT-4 was “finished training” in August 2022, before ChatGPT was even released! I am unsure of what “finished training” means here—is the released model weight-for-weight identical to the 2022 version? Did they do RLHF since then?
Yeah but it’s not clear to me that they needed 8 months of safety research. If they released it after 12 months, they could’ve still written that they’d been “evaluating, adversarially testing, and iteratively improving” it for 12 months. So it’s still not clear to me how much they delayed bc they had to, versus how much (if at all) they did due to the forecasters and/or acceleration considerations.
But this itself is surprising: GPT-4 was “finished training” in August 2022, before ChatGPT was even released! I am unsure of what “finished training” means here—is the released model weight-for-weight identical to the 2022 version? Did they do RLHF since then?
I think “finished training” is the next-token prediction pre-training, and what they did since August is the fine-tuning and the RLHF + other stuff.
So it’s still not clear to me how much they delayed bc they had to, versus how much (if at all) they did due to the forecasters and/or acceleration considerations.
Yeah, completely agree.
I think “finished training” is the next-token prediction pre-training, and what they did since August is the fine-tuning and the RLHF + other stuff.
This seems most likely? But if so, I wish openai had used a different phrase, fine-tuning/RLHF/other stuff is also part of training (unless I’m badly mistaken), and we have this lovely phrase “pre-training” that they could have used instead.
Gonna pull out one bit from the technical report, section 2.12:
My analysis:
They’re very aware of arms races conceptually, and say they dislike arms races for all the right reasons (“One concern of particular importance to OpenAI is the risk of racing dynamics leading to a decline in safety standards, the diffusion of bad norms, and accelerated AI timelines, each of which heighten societal risks associated with AI.”)
They considered two mitigations to race dynamics with respect to releasing GPT-4:
“Quiet communications”, which they didn’t pursue because that didn’t work for ChatGPT (“We also learned from recent deployments that the effectiveness of quiet communications strategy in mitigating acceleration risk can be limited, in particular when novel accessible capabilities are concerned.”)
“Delaying deployment of GPT-4 by a further six months” which they didn’t pursue because ???? [edit: I mean to say they don’t explain why this option wasn’t chosen, unlike the justification given for not pursuing the “quiet communications” strategy. If I had to guess it was reasoning like “well we already waited 8 months, waiting another 6 offers a small benefit, but the marginal returns to delaying are small.”]
There’s a very obvious gap here between what they are saying they are concerned about in terms of accelerating potentially-dangerous AI capabilities, and what they are actually doing.
IMO it’s not clear from the text whether or how long they delayed the release on account of the forecasters’ recommendations.
On page 2 of the system card it says:
(Emphasis added.) This coincides with the “eight months” of safety research they mention. I wasn’t aware of this when I made my original post so I’ll edit it to be fairer.
But this itself is surprising: GPT-4 was “finished training” in August 2022, before ChatGPT was even released! I am unsure of what “finished training” means here—is the released model weight-for-weight identical to the 2022 version? Did they do RLHF since then?
Yeah but it’s not clear to me that they needed 8 months of safety research. If they released it after 12 months, they could’ve still written that they’d been “evaluating, adversarially testing, and iteratively improving” it for 12 months. So it’s still not clear to me how much they delayed bc they had to, versus how much (if at all) they did due to the forecasters and/or acceleration considerations.
I think “finished training” is the next-token prediction pre-training, and what they did since August is the fine-tuning and the RLHF + other stuff.
Yeah, completely agree.
This seems most likely? But if so, I wish openai had used a different phrase, fine-tuning/RLHF/other stuff is also part of training (unless I’m badly mistaken), and we have this lovely phrase “pre-training” that they could have used instead.
Ah yeah, that does seem needlessly ambiguous.