I feel as if there is some unstated idea here that I am not quite inferring. What is the safety approach supposed to be? If there were an organization devoted to this path to AI safety, what activities would that organization be engaged in?
The summary I posted here was just a teaser to the full paper (linked in pgph. 1). That said, your comments show you reasoned pretty closely to points I tried to make therein. Almost no need to read it. :)
The first part is just “regulation”. The second part, “instilling law-abiding values in AIs and humans”, seems like a significant departure. It seems like the proposal involves both (a) designing and enacting a set of appropriate laws, and (b) finding and deploying a way of instilling law-abiding values (in AIs and humans). Possibly (a) includes a law requiring (b): AIs (and AI-producing organizations) must be designed so as to have law-abiding values within some acceptable tolerances.
The main message of the paper is along the lines of “a.” That is, per the claim in the 4th pgph, “Effective legal systems are the best way to address AI safety.” I’m arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about “simply outlawing designs not compatible” is reasonable.
The way I put it in the paper (sect. 3, pgph. 2): “Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical.”
Later I write, “Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary.”
I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.
“AI safety via law can address the full range of safety risks” seems to over-sell the whole section, a major point of which is to claim that AISVL does not apply to the strongest instrumental-convergence concerns. (And why not, exactly? It seems like, if the value-instilling tech existed, it would indeed avert the strongest instrumental-convergence concerns.)
I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It’s hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.
Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:
“For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with.”
The main message of the paper is along the lines of “a.” That is, per the claim in the 4th pgph, “Effective legal systems are the best way to address AI safety.” I’m arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about “simply outlawing designs not compatible” is reasonable.
The way I put it in the paper (sect. 3, pgph. 2): “Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical.”
In that case, I agree with Seth Herd that this approach is not being neglected. Of course it could be done better. I’m not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I’ve met some of them without trying.
I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.
I think this underestimates the difficulty of self-driving cars. In the application of self-driving airplanes (on runways, not in the air), it is indeed possible to make an adequate model of the environment, such that neural networks can be verified to follow a formally specified set of regulations (and self-correct from undesired states to desired states). With self-driving cars, the environment is far too complex to formally model in that way. You get to a point where you are trusting one AI model (of the complex environment) to verify another. And you can’t explore the whole space effectively, so you still can’t provide really strong guarantees (and this translates to errors in practice).
I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It’s hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.
It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very “light” in the face of those “heavy” concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!
I’m not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I’ve met some of them without trying.
Glad to hear it. I hope to find and follow such work. The people I’m aware of are listed on pp. 3-5 of the paper. Was happy to see O’Keefe, Bai et al. (Anthropic), and Nay leaning this way.
It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very “light” in the face of those “heavy” concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!
Yes. I’m definitely being glib about implementation details. First things first. :)
I agree with you that if self-driving-cars can’t be “programmed” (instilled) to be adequately law-abiding, their future isn’t bright. Per above, I’m heartened by Anthropic’s Constitutional AI (priming LLMs with basic “laws”) having some success getting AIs to behave. Ditto for anecdotes I’ve heard about “asking an LLM to come up with a money-making plan that doesn’t violate any laws.” Seems too easy right?
One final comment about implementation details. In the appendix I note:
We suspect emergence of instrumental values is not inevitable for any “sufficiently advanced AI system.” Rather, whether such values emerge depends on what cognitive architecture and environmental conditions (training regimens) are used.
Broadly speaking, implementing AIs using safe architectures (ones not prone to law-breaking) is another implementation direction. Drexler’s CAIS may be an example.
Sure. Getting appropriate new laws enacted is an important element. From the paper:
Initially, in addition to adopting existing bodies of law to implement AISVL, existing processes for how laws are drafted, enacted, enforced, litigated, and maintained would be preserved.
Thereafter, new laws and improvements to existing laws and processes must continually be introduced to make the systems more robust, fair, nimble, efficient, consistent, understandable, accepted, complied with, and enforced.
I’d say the EU AI Act (and similar) work addresses the “new laws” imperative. (I won’t comment (much) on pros and cons of its content. In general, it seems pretty good. I wonder if they considered adding Etzioni’s first law to the mix, “An AI system must be subject to the full gamut of laws that apply to humans”? That is what I meant by “adopting existing bodies of law to implement AISVL.” The item in the EU AI Act about designing generative AIs to not generate illegal content is related.)
The more interesting work will be on improving legal processes along the dimensions listed above. And really interesting will be, as AIs get more autonomous and agentic, the “instilling” part where AIs must dynamically recognize and comply with the legal-moral corpora appropriate to the contexts they find themselves in.
The summary I posted here was just a teaser to the full paper (linked in pgph. 1). That said, your comments show you reasoned pretty closely to points I tried to make therein. Almost no need to read it. :)
The main message of the paper is along the lines of “a.” That is, per the claim in the 4th pgph, “Effective legal systems are the best way to address AI safety.” I’m arguing that having effective legal systems and laws are the critical things. How laws/values get instilled in AIs (and humans) is mostly left as an exercise for the reader. Your point about “simply outlawing designs not compatible” is reasonable.
The way I put it in the paper (sect. 3, pgph. 2): “Many of the proposed non-law-based solutions may be worth pursuing to help assure AI systems are law abiding. However, they are secondary to having a robust, well-managed, readily available corpus of codified law—and complimentary legal systems—as the foundation and ultimate arbiter of acceptable behaviors for all intelligent systems, both biological and mechanical.”
Later I write, “Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary.”
I suspect some kind of direct specification approach (per Bostrom classification) could work where AIs confirm that (non-trivial) actions they are considering comply with legal corpora appropriate to current contexts before taking action. I presume techniques used by the self-driving-car people will be up to the task for their application.
I struggled with what to say about AISVL wrt superintelligence and instrumental convergence. Probably should have let the argument ride without hedging, i.e., superintelligences will have to comply with laws and the demands of legal systems. They will be full partners with humans in enacting and enforcing laws. It’s hard to just shrug off the concerns of the Yudkowskys, Bostroms, and Russells of the world.
Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:
“For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with.”
fwiw, I did skim the doc, very briefly.
In that case, I agree with Seth Herd that this approach is not being neglected. Of course it could be done better. I’m not sure exactly how many people are working on it, but I have the impression that it is more than a dozen, since I’ve met some of them without trying.
I think this underestimates the difficulty of self-driving cars. In the application of self-driving airplanes (on runways, not in the air), it is indeed possible to make an adequate model of the environment, such that neural networks can be verified to follow a formally specified set of regulations (and self-correct from undesired states to desired states). With self-driving cars, the environment is far too complex to formally model in that way. You get to a point where you are trusting one AI model (of the complex environment) to verify another. And you can’t explore the whole space effectively, so you still can’t provide really strong guarantees (and this translates to errors in practice).
It seems to me like you are somewhat shrugging off those concerns, since the technological interventions (eg smart contracts, LLMs understanding laws, whatever self-driving-car people get up to) are very “light” in the face of those “heavy” concerns. But a legal approach need not shrug off those concerns. For example, law could require the kind of verification we can now apply to airplane autopilot be applied to self-driving-cars as well. This would make self-driving illegal in effect until a large breakthrough in ML verification takes place, but it would work!
Glad to hear it. I hope to find and follow such work. The people I’m aware of are listed on pp. 3-5 of the paper. Was happy to see O’Keefe, Bai et al. (Anthropic), and Nay leaning this way.
Yes. I’m definitely being glib about implementation details. First things first. :)
I agree with you that if self-driving-cars can’t be “programmed” (instilled) to be adequately law-abiding, their future isn’t bright. Per above, I’m heartened by Anthropic’s Constitutional AI (priming LLMs with basic “laws”) having some success getting AIs to behave. Ditto for anecdotes I’ve heard about “asking an LLM to come up with a money-making plan that doesn’t violate any laws.” Seems too easy right?
One final comment about implementation details. In the appendix I note:
Broadly speaking, implementing AIs using safe architectures (ones not prone to law-breaking) is another implementation direction. Drexler’s CAIS may be an example.
Would you count all the people who worked on the EU AI act?
Sure. Getting appropriate new laws enacted is an important element. From the paper:
I’d say the EU AI Act (and similar) work addresses the “new laws” imperative. (I won’t comment (much) on pros and cons of its content. In general, it seems pretty good. I wonder if they considered adding Etzioni’s first law to the mix, “An AI system must be subject to the full gamut of laws that apply to humans”? That is what I meant by “adopting existing bodies of law to implement AISVL.” The item in the EU AI Act about designing generative AIs to not generate illegal content is related.)
The more interesting work will be on improving legal processes along the dimensions listed above. And really interesting will be, as AIs get more autonomous and agentic, the “instilling” part where AIs must dynamically recognize and comply with the legal-moral corpora appropriate to the contexts they find themselves in.