Perhaps the most important and (hopefully) actionable recommendation of the proposal is in the conclusion:
“For the future safety and wellbeing of all sentient systems, work should occur in earnest to improve legal processes and laws so they are more robust, fair, nimble, efficient, consistent, understandable, accepted, and complied with.” (comment)
Sorry for sounding harsh. But to say something meaningful, I believe you have to argue two things:
Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).
I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).
Later I write, “Suggested improvements to law and legal process are mostly beyond the scope of this brief. It is possible, however, that significant technological advances will not be needed for implementing some key capabilities. For example, current Large Language Models are nearly capable of understanding vast legal corpora and making appropriate legal decisions for humans and AI systems (Katz et al., 2023). Thus, a wholesale switch to novel legal encodings (e.g., computational and smart contracts) may not be necessary.”
If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. “Asking AI to follow something” is not what Bostrom means by direct specification, as far as I understand.
Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).
I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).
My argument goes in a different direction. I reject premise (1) and claim there is an “essential equivalence and intimate link between consensus ethics and democratic law [that] provide a philosophical and practical basis for legal systems that marry values and norms (“virtue cores”) with rules that address real world situations (“consequentialist shells”).”
In the body of the paper I characterize democratic law and consensus ethics as follows:
Both are human inventions intended to facilitate the wellbeing of individuals and the collective. They represent shared values culturally determined through rational consideration and negotiation. To be effective, democratic law and consensus ethics should reflect sufficient agreement of a significant majority of those affected. Democratic law and consensus ethics are not inviolate physical laws, instinctive truths, or commandments from deities, kings, or autocrats. They do not represent individual values, which vary from person to person and are often based on emotion, irrational ideologies, confusion, or psychopathy.
That is, democratic law corresponds to the common definition of Law. Consensus ethics is essentially equivalent to human values when understood in the standard philosophical sense as “shared values culturally determined through rational consideration and negotiation.” In short, I’m of the opinion “Law = Ethics.”
Regarding your premise (2): See my reply to Abram’s comment. I’m mostly ducking the “instilling” aspects. I’m arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary.
If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. “Asking AI to follow something” is not what Bostrom means by direct specification, as far as I understand.
My reference to Bostrom’s direct specification was not intended to match his use, i.e., hard coding (instilling) human values in AIs. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems. Of the various alignment approaches Bostrom mentioned (and deprecated), I thought direct specification came closest to AISVL.
Maybe there’s a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It’s impossible to reject premise (1) without losing the proposal’s meaning.
Premise (1) is possible to reject only if you’re not solving Alignment but solving some other problem.
I’m arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary.
My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems.
If an AI can be Aligned externally, then it’s already safe enough. It feels like...
You’re not talking about solving Alignment, but talking about some different problem. And I’m not sure what that problem is.
For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.
If an AI can be Aligned externally, then it’s already safe enough. It feels like...
You’re not talking about solving Alignment, but talking about some different problem. And I’m not sure what that problem is.
For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.
I’m talking about the need for all AIs (and humans) to be bound by legal systems that include key consensus laws/ethics/values. It may seem obvious, but I think this position is under-appreciated and not universally accepted.
By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky’s CEV.
Maybe you should edit the post to add something like this:
My proposal is not about the hardest parts of the Alignment problem. My proposal is not trying to solve theoretical problems with Inner Alignment or Outer Alignment (Goodhart, loopholes). I’m just assuming those problems won’t be relevant enough. Or humanity simply won’t create anything AGI-like (see CAIS).
Instead of discussing the usual problems in Alignment theory, I merely argue X. X is not a universally accepted claim, here’s evidence that it’s not universally accepted: [write the evidence here].
...
By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky’s CEV.
I think the key problems are not “addressed”, you just assume they won’t exist. And laws are not a “practical implementation of CEV”.
Sorry for sounding harsh. But to say something meaningful, I believe you have to argue two things:
Laws are distinct enough from human values (1), but following laws / caring about laws / reporting about predicted law violations prevents the violation of human values (2).
I think the post fails to argue both points. I see no argument that instilling laws is distinct enough from instilling values/corrigibility/human semantics in general (1) and that laws actually prevent misalignment (2).
If AI can be just asked to follow your clever idea, then AI is already safe enough without your clever idea. “Asking AI to follow something” is not what Bostrom means by direct specification, as far as I understand.
My argument goes in a different direction. I reject premise (1) and claim there is an “essential equivalence and intimate link between consensus ethics and democratic law [that] provide a philosophical and practical basis for legal systems that marry values and norms (“virtue cores”) with rules that address real world situations (“consequentialist shells”).”
In the body of the paper I characterize democratic law and consensus ethics as follows:
That is, democratic law corresponds to the common definition of Law. Consensus ethics is essentially equivalent to human values when understood in the standard philosophical sense as “shared values culturally determined through rational consideration and negotiation.” In short, I’m of the opinion “Law = Ethics.”
Regarding your premise (2): See my reply to Abram’s comment. I’m mostly ducking the “instilling” aspects. I’m arguing for open, external, effective legal systems as the key to AI alignment and safety. I see the implementation/instilling details as secondary.
My reference to Bostrom’s direct specification was not intended to match his use, i.e., hard coding (instilling) human values in AIs. My usage refers to specifying rules/laws/ethics externally so they are available and usable by all intelligent systems. Of the various alignment approaches Bostrom mentioned (and deprecated), I thought direct specification came closest to AISVL.
Maybe there’s a misunderstanding. Premise (1) makes sure that your proposal is different from any other proposal. It’s impossible to reject premise (1) without losing the proposal’s meaning.
Premise (1) is possible to reject only if you’re not solving Alignment but solving some other problem.
If an AI can be Aligned externally, then it’s already safe enough. It feels like...
You’re not talking about solving Alignment, but talking about some different problem. And I’m not sure what that problem is.
For your proposal to work, the problem needs to be already solved. All the hard/interesting parts need to be already solved.
I’m talking about the need for all AIs (and humans) to be bound by legal systems that include key consensus laws/ethics/values. It may seem obvious, but I think this position is under-appreciated and not universally accepted.
By focusing on the external legal system, many key problems associated with alignment (as recited in the Summary of Argument) are addressed. One worth highlighting is 4.4, which suggests AISVL can assure alignment in perpetuity despite changes in values, environmental conditions, and technologies, i.e., a practical implementation of Yudkowsky’s CEV.
Maybe you should edit the post to add something like this:
...
I think the key problems are not “addressed”, you just assume they won’t exist. And laws are not a “practical implementation of CEV”.