On further reflection, I’d tentatively propose something along these lines as an additional measure:
As I’ve now seen others suggest, trigger limits determined only as a percentage of the state of the art’s performance.
This could be implemented as a proposal to give a government agency the power to work as the overseer and final arbiter of deciding, once per year for the following year (and ad-hoc on an emergency basis), the metrics and threshold percentages of indexing what is determined state of the art. This would be done in consultation with representatives from each of the big AI labs (as determined by, e.g., having invested >$100M in AI compute), and including broader public, academic, and open source AI community feedback but ultimately decided by the agency.
The power could also be reserved for the agency to determine that specific model capabilities, if well defined and clearly measurable, could be listed as automatically triggering regulation.
This very clearly makes the regulation target the true “frontier AI” while leaving others out of the collateral crosshairs.
I say tentatively, as an immediate need for any sort of specific model-capability-level regulation to prevent existential risk is not remotely apparent with the current architectures for models (Autoregressive LLMs). I see the potential in the future for risk, but pending major breakthroughs in architecture.
Existing models, and the immediately coming generation, are trivially knowable as non-threatening at an existential level. Why? They are incapable of objective driven actions and planning. The worst that can be done is within the narrow span of agent-like actions that can be covered via extensive and deliberate programmatic connection of LLMs into heavily engineered systems. Any harms that might result would be at worst within a narrow scope that’s either tangential to the intended actions, or deliberate human intent that’s likely covered within existing criminal frameworks. The worst impacts would be narrowly scoped and economic, with a significant human intent element.
These systems as they exist and are currently being developed have no ability to be made objective-driven and autonomous in any real sense. It would be a major and obvious technological turning point that requires a new model paradigm from the outset.
There are key capabilities which we would have to intentionally design in and test for that should be the focus of future regulations: 1) Learning to represent the world in a more generalized way. Autoregressive LLMs build a fragile tree of hopefully-correct-next-tokens, that’s just been molded into the shape we like via absurd amounts of pre-compute, and hardly much more. A more generalized hierarchical predictive model would be what we’d need to explicitly engineer in. 2) A modularized cognitive environment which allows for System 2 thinking, with an actively engaged interplay of a cost/reward system with perceptual input, providing a persistent engineered mechanism for planning complex actions in an objective-oriented way, and feeding them into its own persistent learning.
Without these foundations, which are major active fields of study with no obvious immediate solutions, there’s no real potential for building accelerative intelligences or anything that can act as its own force multiplier in a general sense.
So any regulations which targeted existing autoregressive LLMs—regardless of compute scale—would be “out of an abundance of caution”, with no clear indication of a significant potential for existential risk; likely mostly for the sake of setting the regulatory framework and industry/public/academic feedback systems in motion to begin establishing the standards for evaluations of potential future regulations. This would be predicated upon advances in objective-oriented architectures.
On further reflection, I’d tentatively propose something along these lines as an additional measure:
As I’ve now seen others suggest, trigger limits determined only as a percentage of the state of the art’s performance.
This could be implemented as a proposal to give a government agency the power to work as the overseer and final arbiter of deciding, once per year for the following year (and ad-hoc on an emergency basis), the metrics and threshold percentages of indexing what is determined state of the art.
This would be done in consultation with representatives from each of the big AI labs (as determined by, e.g., having invested >$100M in AI compute), and including broader public, academic, and open source AI community feedback but ultimately decided by the agency.
The power could also be reserved for the agency to determine that specific model capabilities, if well defined and clearly measurable, could be listed as automatically triggering regulation.
This very clearly makes the regulation target the true “frontier AI” while leaving others out of the collateral crosshairs.
I say tentatively, as an immediate need for any sort of specific model-capability-level regulation to prevent existential risk is not remotely apparent with the current architectures for models (Autoregressive LLMs). I see the potential in the future for risk, but pending major breakthroughs in architecture.
Existing models, and the immediately coming generation, are trivially knowable as non-threatening at an existential level. Why? They are incapable of objective driven actions and planning. The worst that can be done is within the narrow span of agent-like actions that can be covered via extensive and deliberate programmatic connection of LLMs into heavily engineered systems. Any harms that might result would be at worst within a narrow scope that’s either tangential to the intended actions, or deliberate human intent that’s likely covered within existing criminal frameworks. The worst impacts would be narrowly scoped and economic, with a significant human intent element.
These systems as they exist and are currently being developed have no ability to be made objective-driven and autonomous in any real sense. It would be a major and obvious technological turning point that requires a new model paradigm from the outset.
There are key capabilities which we would have to intentionally design in and test for that should be the focus of future regulations:
1) Learning to represent the world in a more generalized way. Autoregressive LLMs build a fragile tree of hopefully-correct-next-tokens, that’s just been molded into the shape we like via absurd amounts of pre-compute, and hardly much more. A more generalized hierarchical predictive model would be what we’d need to explicitly engineer in.
2) A modularized cognitive environment which allows for System 2 thinking, with an actively engaged interplay of a cost/reward system with perceptual input, providing a persistent engineered mechanism for planning complex actions in an objective-oriented way, and feeding them into its own persistent learning.
Without these foundations, which are major active fields of study with no obvious immediate solutions, there’s no real potential for building accelerative intelligences or anything that can act as its own force multiplier in a general sense.
So any regulations which targeted existing autoregressive LLMs—regardless of compute scale—would be “out of an abundance of caution”, with no clear indication of a significant potential for existential risk; likely mostly for the sake of setting the regulatory framework and industry/public/academic feedback systems in motion to begin establishing the standards for evaluations of potential future regulations. This would be predicated upon advances in objective-oriented architectures.