We haven’t asked specific individuals if they’re comfortable being named publicly yet, but if advisors are comfortable being named, I’ll announce that soon. We’re also in the process of having conversations with academics, AI ethics folks, AI developers at small companies, and other civil society groups to discuss policy ideas with them.
So far, I’m confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn’t true, we’ll either rethink our proposals or remove this claim from our advocacy efforts. Also, as stated in a comment below:
I’ve changed the wording to “Only a few technical labs (OpenAI, DeepMind, Meta, etc) and people working with their models would be regulated currently.” The point of this sentence is to emphasize that this definition still wouldn’t apply to the vast majority of AI development—most AI development uses small systems, e.g. image classifiers, self driving cars, audio models, weather forecasting, the majority of AI used in health care, etc.
Already, there are dozens of fine-tuned Llama2 models scoring above 70 on MMLU. They are laughably far from threats. This does seem like an exceptionally low bar. GPT-4, given the right prompt crafting, and adjusting for errors in MMLU has just been shown to be capable of 89 on MMLU. It would not be surprising for Llama models to achieve >80 on MMLU in the next 6 months.
I think focusing on a benchmark like MMLU is not the right approach, and will be very quickly outmoded. If we look at the other criteria (which, as you propose it now, any and all are a tripwire for regulation) parameter counts also sticks out as a somewhat arbitrary and overly limiting metric. There are many academic models with >80B parameters which are far less performant and agentic than e.g. Llama 70B.
Of the proposed trips, cost of training seems the most salient. I would focus on that and possibly only that for the time being. >$10M model training cost seems like a reasonable metric. If your concern is that the bar will lower over time, build some scaling down of costs per annum for the threshold into the proposal.
On further reflection, I’d tentatively propose something along these lines as an additional measure:
As I’ve now seen others suggest, trigger limits determined only as a percentage of the state of the art’s performance.
This could be implemented as a proposal to give a government agency the power to work as the overseer and final arbiter of deciding, once per year for the following year (and ad-hoc on an emergency basis), the metrics and threshold percentages of indexing what is determined state of the art. This would be done in consultation with representatives from each of the big AI labs (as determined by, e.g., having invested >$100M in AI compute), and including broader public, academic, and open source AI community feedback but ultimately decided by the agency.
The power could also be reserved for the agency to determine that specific model capabilities, if well defined and clearly measurable, could be listed as automatically triggering regulation.
This very clearly makes the regulation target the true “frontier AI” while leaving others out of the collateral crosshairs.
I say tentatively, as an immediate need for any sort of specific model-capability-level regulation to prevent existential risk is not remotely apparent with the current architectures for models (Autoregressive LLMs). I see the potential in the future for risk, but pending major breakthroughs in architecture.
Existing models, and the immediately coming generation, are trivially knowable as non-threatening at an existential level. Why? They are incapable of objective driven actions and planning. The worst that can be done is within the narrow span of agent-like actions that can be covered via extensive and deliberate programmatic connection of LLMs into heavily engineered systems. Any harms that might result would be at worst within a narrow scope that’s either tangential to the intended actions, or deliberate human intent that’s likely covered within existing criminal frameworks. The worst impacts would be narrowly scoped and economic, with a significant human intent element.
These systems as they exist and are currently being developed have no ability to be made objective-driven and autonomous in any real sense. It would be a major and obvious technological turning point that requires a new model paradigm from the outset.
There are key capabilities which we would have to intentionally design in and test for that should be the focus of future regulations: 1) Learning to represent the world in a more generalized way. Autoregressive LLMs build a fragile tree of hopefully-correct-next-tokens, that’s just been molded into the shape we like via absurd amounts of pre-compute, and hardly much more. A more generalized hierarchical predictive model would be what we’d need to explicitly engineer in. 2) A modularized cognitive environment which allows for System 2 thinking, with an actively engaged interplay of a cost/reward system with perceptual input, providing a persistent engineered mechanism for planning complex actions in an objective-oriented way, and feeding them into its own persistent learning.
Without these foundations, which are major active fields of study with no obvious immediate solutions, there’s no real potential for building accelerative intelligences or anything that can act as its own force multiplier in a general sense.
So any regulations which targeted existing autoregressive LLMs—regardless of compute scale—would be “out of an abundance of caution”, with no clear indication of a significant potential for existential risk; likely mostly for the sake of setting the regulatory framework and industry/public/academic feedback systems in motion to begin establishing the standards for evaluations of potential future regulations. This would be predicated upon advances in objective-oriented architectures.
I agree that benchmarks might not be the right criteria, but training cost isn’t the right metric either IMO, since compute and algorithmic improvement will be bringing these costs down every year. Instead, I would propose an effective compute threshold, i.e. number of FLOP while accounting for algorithmic improvements.
So far, I’m confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn’t true, we’ll either rethink our proposals or remove this claim from our advocacy efforts. Also, as stated in a comment below:
It seems to me that for AI regulation to have important effects, it probably has to affect many AI developers around the point where training more powerful AIs would be dangerous.
So, if AI regulation is aiming to be useful in short timelines and AI is dangerous, it will probably have to affect most AI developers.
And if policy requires a specific flop threshold or similar, then due to our vast uncertainty, that flop threshold probably will have to soon affect many AI developers. My guess is that the criteria you establish would in fact affect a large number of AI developers soon (perhaps most people interested in working with SOTA open-source LLMs).
In general, safe flop and performance thresholds have to unavoidably be pretty low to actually be sufficient slightly longer term. For instance, suppose that 10^27 flops is a dangerous amount of effective compute (relative to the performance of the GPT4 training run). Then, if algorithmic progress is 2x per year, 10^24 real flops is 10^27 effective flop in just 10 years.
I think you probably should note that this proposal is likely to affect the majority of people working with generative AI in the next 5-10 years. This seems basically unavoidable.
I’d guess that the best would be to define a specific flop or dollar threshold and have this steadily decrease over time at a conservative rate (e.g. 2x lower threshold each year).
Presumably, your hope for avoiding this flop threshold becoming burdensome soon is:
As AI advances and dangerous systems become increasingly easy to develop at a fraction of the current cost, the definition of frontier AI will need to change. This is why we need an expert-led administration that can adapt the criteria for frontier AI to address the evolving nature of this technology.
So far, I’m confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn’t true, we’ll either rethink our proposals or remove this claim from our advocacy efforts.
It seems to me like you’ve received this feedback already in this very thread. The fact that you’re going to edit the claim to basically say “this doesn’t effect most people because most people don’t work on LLMs” completely dodges the actual issue here, which is that there’s a large non-profit and independent open source LLM community that this would heavily impact.
I applaud your honestly in admitting one approach you might take is to “remove this claim from our advocacy efforts,” but am quite sad to see that you don’t seem to care about limiting the impact of your regulation to potentially dangerous models.
No, your proposal will affect nearly every LLM that has come out in the last 6 months. Llama, MPT, Falcon, RedPajama, OpenLlama, Qwen, StarCoder, have all trained on equal to or greater than 1T tokens. Did you do so little research that you had no idea about this to have made that original statement?
We haven’t asked specific individuals if they’re comfortable being named publicly yet, but if advisors are comfortable being named, I’ll announce that soon. We’re also in the process of having conversations with academics, AI ethics folks, AI developers at small companies, and other civil society groups to discuss policy ideas with them.
So far, I’m confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn’t true, we’ll either rethink our proposals or remove this claim from our advocacy efforts. Also, as stated in a comment below:
Already, there are dozens of fine-tuned Llama2 models scoring above 70 on MMLU. They are laughably far from threats. This does seem like an exceptionally low bar. GPT-4, given the right prompt crafting, and adjusting for errors in MMLU has just been shown to be capable of 89 on MMLU. It would not be surprising for Llama models to achieve >80 on MMLU in the next 6 months.
I think focusing on a benchmark like MMLU is not the right approach, and will be very quickly outmoded. If we look at the other criteria (which, as you propose it now, any and all are a tripwire for regulation) parameter counts also sticks out as a somewhat arbitrary and overly limiting metric. There are many academic models with >80B parameters which are far less performant and agentic than e.g. Llama 70B.
Of the proposed trips, cost of training seems the most salient. I would focus on that and possibly only that for the time being. >$10M model training cost seems like a reasonable metric. If your concern is that the bar will lower over time, build some scaling down of costs per annum for the threshold into the proposal.
On further reflection, I’d tentatively propose something along these lines as an additional measure:
As I’ve now seen others suggest, trigger limits determined only as a percentage of the state of the art’s performance.
This could be implemented as a proposal to give a government agency the power to work as the overseer and final arbiter of deciding, once per year for the following year (and ad-hoc on an emergency basis), the metrics and threshold percentages of indexing what is determined state of the art.
This would be done in consultation with representatives from each of the big AI labs (as determined by, e.g., having invested >$100M in AI compute), and including broader public, academic, and open source AI community feedback but ultimately decided by the agency.
The power could also be reserved for the agency to determine that specific model capabilities, if well defined and clearly measurable, could be listed as automatically triggering regulation.
This very clearly makes the regulation target the true “frontier AI” while leaving others out of the collateral crosshairs.
I say tentatively, as an immediate need for any sort of specific model-capability-level regulation to prevent existential risk is not remotely apparent with the current architectures for models (Autoregressive LLMs). I see the potential in the future for risk, but pending major breakthroughs in architecture.
Existing models, and the immediately coming generation, are trivially knowable as non-threatening at an existential level. Why? They are incapable of objective driven actions and planning. The worst that can be done is within the narrow span of agent-like actions that can be covered via extensive and deliberate programmatic connection of LLMs into heavily engineered systems. Any harms that might result would be at worst within a narrow scope that’s either tangential to the intended actions, or deliberate human intent that’s likely covered within existing criminal frameworks. The worst impacts would be narrowly scoped and economic, with a significant human intent element.
These systems as they exist and are currently being developed have no ability to be made objective-driven and autonomous in any real sense. It would be a major and obvious technological turning point that requires a new model paradigm from the outset.
There are key capabilities which we would have to intentionally design in and test for that should be the focus of future regulations:
1) Learning to represent the world in a more generalized way. Autoregressive LLMs build a fragile tree of hopefully-correct-next-tokens, that’s just been molded into the shape we like via absurd amounts of pre-compute, and hardly much more. A more generalized hierarchical predictive model would be what we’d need to explicitly engineer in.
2) A modularized cognitive environment which allows for System 2 thinking, with an actively engaged interplay of a cost/reward system with perceptual input, providing a persistent engineered mechanism for planning complex actions in an objective-oriented way, and feeding them into its own persistent learning.
Without these foundations, which are major active fields of study with no obvious immediate solutions, there’s no real potential for building accelerative intelligences or anything that can act as its own force multiplier in a general sense.
So any regulations which targeted existing autoregressive LLMs—regardless of compute scale—would be “out of an abundance of caution”, with no clear indication of a significant potential for existential risk; likely mostly for the sake of setting the regulatory framework and industry/public/academic feedback systems in motion to begin establishing the standards for evaluations of potential future regulations. This would be predicated upon advances in objective-oriented architectures.
I agree that benchmarks might not be the right criteria, but training cost isn’t the right metric either IMO, since compute and algorithmic improvement will be bringing these costs down every year. Instead, I would propose an effective compute threshold, i.e. number of FLOP while accounting for algorithmic improvements.
It seems to me that for AI regulation to have important effects, it probably has to affect many AI developers around the point where training more powerful AIs would be dangerous.
So, if AI regulation is aiming to be useful in short timelines and AI is dangerous, it will probably have to affect most AI developers.
And if policy requires a specific flop threshold or similar, then due to our vast uncertainty, that flop threshold probably will have to soon affect many AI developers. My guess is that the criteria you establish would in fact affect a large number of AI developers soon (perhaps most people interested in working with SOTA open-source LLMs).
In general, safe flop and performance thresholds have to unavoidably be pretty low to actually be sufficient slightly longer term. For instance, suppose that 10^27 flops is a dangerous amount of effective compute (relative to the performance of the GPT4 training run). Then, if algorithmic progress is 2x per year, 10^24 real flops is 10^27 effective flop in just 10 years.
I think you probably should note that this proposal is likely to affect the majority of people working with generative AI in the next 5-10 years. This seems basically unavoidable.
I’d guess that the best would be to define a specific flop or dollar threshold and have this steadily decrease over time at a conservative rate (e.g. 2x lower threshold each year).
Presumably, your hope for avoiding this flop threshold becoming burdensome soon is:
It seems to me like you’ve received this feedback already in this very thread. The fact that you’re going to edit the claim to basically say “this doesn’t effect most people because most people don’t work on LLMs” completely dodges the actual issue here, which is that there’s a large non-profit and independent open source LLM community that this would heavily impact.
I applaud your honestly in admitting one approach you might take is to “remove this claim from our advocacy efforts,” but am quite sad to see that you don’t seem to care about limiting the impact of your regulation to potentially dangerous models.
No, your proposal will affect nearly every LLM that has come out in the last 6 months. Llama, MPT, Falcon, RedPajama, OpenLlama, Qwen, StarCoder, have all trained on equal to or greater than 1T tokens. Did you do so little research that you had no idea about this to have made that original statement?