I appreciate your engagement! But I think your position is mistaken for a few reasons:
First, I explicitly define LFAI to be about compliance with “some defined set of human-originating rules (‘laws’).” I do not argue that AI should follow all laws, which does indeed seem both hard and unnecessary. But I should have been more clear about this. (I did have some clarification in an earlier draft, which I guess I accidentally excised.) So I agree that there should be careful thought about which laws an LFAI should be trained to follow, for the reasons you cite. That question itself could be answered ethically or legally, and could vary with the system for the reasons you cite. But to make this a compelling objection against LFAI, you would have to make, I think, a stronger claim: that the set of laws worth having AI follow is so small or unimportant as to be not worth trying to follow. That seems unlikely.
Second, you point to a lot of cases where the law would be underdetermined as to some out-of-distribution (from the caselaw/motivations of the law) action that the AI wanted to do, and say that:
I don’t know about you, but I want such a decision made by humans seriously considering the issue, or an AI’s view of our best interests. I don’t want it made by some pedantic letter of the law interpretation of some act written 100′s of years ago. Where the decision comes down to arbitrary phrasing decisions and linguistic quirks.
But I think LFAI would actually facilitate the result you want, not hinder it:
As I say, the pseudocode would first ask whether the act X being contemplated is clearly illegal with reference to the set of laws the LFAI is bound to follow. If it is, then that seems to be some decent (but not conclusive) evidence that there has been a deliberative process that prohibited X.
The pseudocode then asks whether X is maybe-illegal. If there has not been deliberation about analogous actions, that would suggest uncertainty, which would weigh in the favor of not-X. If the uncertainty is substantial, that might be decisive against X.
If the AI’s estimation in either direction makes a mistake as to what humans’ “true” preferences regarding X are, then the humans can decide to change the rules. The law is dynamic, and therefore the deliberative processes that shape it would/could shape an LFAI’s constraints.
Furthermore, all of this has to be considered as against the backdrop of a non-LFAI system. It seems much more likely to facilitate the deliberative result than just having an AI that is totally ignorant of the law.
Your point about the laws being imperfect is well-taken, but I believe overstated. Certainly many laws are substantively bad or shaped by bad processes. But I would bet that most people, probably including you, would rather live among agents that scrupulously followed the law than agents who paid it no heed and simply pursued their objective functions.
First, I explicitly define LFAI to be about compliance with “some defined set of human-originating rules (‘laws’).” I do not argue that AI should follow all laws, which does indeed seem both hard and unnecessary.
Sure, some of the failure modes mentioned at the bottom disappear when you do that.
I think, a stronger claim: that the set of laws worth having AI follow is so small or unimportant as to be not worth trying to follow. That seems unlikely.
If some law is so obviously a good idea in all possible circumstances, the AI will do it whether it is law following or human preference following.
The question isn’t if there are laws that are better than nothing. Its whether we are better encoding what we want the AI to do into laws, or into terms of a utility function. Which format (or maybe some other format) is best for encoding our preferences.
If their objective function is something like the CEV of humanity, any extra laws imposed on top of that are entropic.
But I would bet that most people, probably including you, would rather live among agents that scrupulously followed the law than agents who paid it no heed and simply pursued their objective functions.
If the AI’s have no correlation to human wellbeing in their objectives, the weak correlation given by law following may be better than nothing. If the AI is already strongly correlated with human wellbeing, then any laws imposed are making the AI worse.
If the AI’s estimation in either direction makes a mistake as to what humans’ “true” preferences regarding X are, then the humans can decide to change the rules. The law is dynamic, and therefore the deliberative processes that shape it would/could shape an LFAI’s constraints.
If the human has never imagined mind uploading, does A go up to the human and explain what it is, asking if maybe that law should be changed?
If some law is so obviously a good idea in all possible circumstances, the AI will do it whether it is law following or human preference following.
As explained in the second post, I don’t agree that that’s implied if the AI is intent-aligned but not aligned with some deeper moral framework like CEV.
The question isn’t if there are laws that are better than nothing. Its whether we are better encoding what we want the AI to do into laws, or into terms of a utility function. Which format (or maybe some other format) is best for encoding our preferences.
I agree that that is an important question. I think we have a very long track record of embedding our values into law. The point of this sequence is to argue that we should therefore at a minimum explore pointing to (some subset of) laws, which has a number of benefits relative to trying to integrate values into the utility function objectively. I will defend that idea more fully in a later post, but to briefly motivate the idea, law (as compared to something like the values that would come from CEV) is more or less completely written down, much more agreed-upon, much more formalized, and has built-in processes for resolving ambiguities and contradictions.
If the human has never imagined mind uploading, does A go up to the human and explain what it is, asking if maybe that law should be changed?
A cartoon version of this may be that A says “It’s not clear whether that’s legal, and if it’s not legal it would be very bad (murder), so I can’t proceed until there’s clarification.” If the human still wants to proceed, they can try to:
I think we have a very long track record of embedding our values into law.
I mean you could say that if we haven’t figured out how to do it well in the last 10,000 years, maybe don’t plan on doing it in the next 10. That’s kind of being mean though.
If you have a functioning arbitration process, can’t you just say “don’t do bad things” and leave everything down to the arbitration?
I also kind of feel that adding laws is going in the direction of more complexity. And we really want as simple as possible. (Ie the minimal AI that can sit in a MIRI basement and help them figure out the rest of AI theory or something)
If the human still wants to proceed, they can try to:
I was talking about a scenario where the human has never imagined the possibility, and asking if the AI mentions the possibility to the human (knowing the human may change the law to get it)
The human says “cure my cancer”. The AI reasons that it can
Tell the human of a drug that cures its cancer in the conventional sense.
Tell the human about mind uploading, never mentioning the chemical cure.
If the AI picks 2, the human will change the “law” (which isn’t the actual law, its just some text file the AI wants to obey). Then the AI can upload the human and the human will have a life the AI judges as overall better for them.
You don’t want the AI to never mention a really good idea because it happens to be illegal on a technicality. You also don’t want all the plans to be “persuade humans to make everything legal, then …”
I appreciate your engagement! But I think your position is mistaken for a few reasons:
First, I explicitly define LFAI to be about compliance with “some defined set of human-originating rules (‘laws’).” I do not argue that AI should follow all laws, which does indeed seem both hard and unnecessary. But I should have been more clear about this. (I did have some clarification in an earlier draft, which I guess I accidentally excised.) So I agree that there should be careful thought about which laws an LFAI should be trained to follow, for the reasons you cite. That question itself could be answered ethically or legally, and could vary with the system for the reasons you cite. But to make this a compelling objection against LFAI, you would have to make, I think, a stronger claim: that the set of laws worth having AI follow is so small or unimportant as to be not worth trying to follow. That seems unlikely.
Second, you point to a lot of cases where the law would be underdetermined as to some out-of-distribution (from the caselaw/motivations of the law) action that the AI wanted to do, and say that:
But I think LFAI would actually facilitate the result you want, not hinder it:
As I say, the pseudocode would first ask whether the act X being contemplated is clearly illegal with reference to the set of laws the LFAI is bound to follow. If it is, then that seems to be some decent (but not conclusive) evidence that there has been a deliberative process that prohibited X.
The pseudocode then asks whether X is maybe-illegal. If there has not been deliberation about analogous actions, that would suggest uncertainty, which would weigh in the favor of not-X. If the uncertainty is substantial, that might be decisive against X.
If the AI’s estimation in either direction makes a mistake as to what humans’ “true” preferences regarding X are, then the humans can decide to change the rules. The law is dynamic, and therefore the deliberative processes that shape it would/could shape an LFAI’s constraints.
Furthermore, all of this has to be considered as against the backdrop of a non-LFAI system. It seems much more likely to facilitate the deliberative result than just having an AI that is totally ignorant of the law.
Your point about the laws being imperfect is well-taken, but I believe overstated. Certainly many laws are substantively bad or shaped by bad processes. But I would bet that most people, probably including you, would rather live among agents that scrupulously followed the law than agents who paid it no heed and simply pursued their objective functions.
Sure, some of the failure modes mentioned at the bottom disappear when you do that.
If some law is so obviously a good idea in all possible circumstances, the AI will do it whether it is law following or human preference following.
The question isn’t if there are laws that are better than nothing. Its whether we are better encoding what we want the AI to do into laws, or into terms of a utility function. Which format (or maybe some other format) is best for encoding our preferences.
If their objective function is something like the CEV of humanity, any extra laws imposed on top of that are entropic.
If the AI’s have no correlation to human wellbeing in their objectives, the weak correlation given by law following may be better than nothing. If the AI is already strongly correlated with human wellbeing, then any laws imposed are making the AI worse.
If the human has never imagined mind uploading, does A go up to the human and explain what it is, asking if maybe that law should be changed?
As explained in the second post, I don’t agree that that’s implied if the AI is intent-aligned but not aligned with some deeper moral framework like CEV.
I agree that that is an important question. I think we have a very long track record of embedding our values into law. The point of this sequence is to argue that we should therefore at a minimum explore pointing to (some subset of) laws, which has a number of benefits relative to trying to integrate values into the utility function objectively. I will defend that idea more fully in a later post, but to briefly motivate the idea, law (as compared to something like the values that would come from CEV) is more or less completely written down, much more agreed-upon, much more formalized, and has built-in processes for resolving ambiguities and contradictions.
A cartoon version of this may be that A says “It’s not clear whether that’s legal, and if it’s not legal it would be very bad (murder), so I can’t proceed until there’s clarification.” If the human still wants to proceed, they can try to:
Change the law.
Get a declaratory judgment that it’s not in fact against the law.
I mean you could say that if we haven’t figured out how to do it well in the last 10,000 years, maybe don’t plan on doing it in the next 10. That’s kind of being mean though.
If you have a functioning arbitration process, can’t you just say “don’t do bad things” and leave everything down to the arbitration?
I also kind of feel that adding laws is going in the direction of more complexity. And we really want as simple as possible. (Ie the minimal AI that can sit in a MIRI basement and help them figure out the rest of AI theory or something)
I was talking about a scenario where the human has never imagined the possibility, and asking if the AI mentions the possibility to the human (knowing the human may change the law to get it)
The human says “cure my cancer”. The AI reasons that it can
Tell the human of a drug that cures its cancer in the conventional sense.
Tell the human about mind uploading, never mentioning the chemical cure.
If the AI picks 2, the human will change the “law” (which isn’t the actual law, its just some text file the AI wants to obey). Then the AI can upload the human and the human will have a life the AI judges as overall better for them.
You don’t want the AI to never mention a really good idea because it happens to be illegal on a technicality. You also don’t want all the plans to be “persuade humans to make everything legal, then …”