When I first read the title of this post, I was assuming that you meant “law” in the sense of Asimov’s Three Laws (i.e. a set of rules intended for AI), but it seems like you mean The Law (presumably US law). These are two very different things, and don’t I don’t think it’s useful to conflate them.
We probably don’t want to define this as minimizing legal noncompliance, since this would make the system extremely risk-averse to the point of being useless. More likely, one would attempt to weight legal downside risks heavily in the agent’s objective function, such that it would keep legal risk to an acceptable level.
It seems to me that all of the problems in alignment start with the objective function. The standard alignment-risk argument goes: “Extremely powerful optimizers optimize for things humans don’t want, because what humans want cannot be described numerically and what can be described numerically is usually/always disastrous.” I anticipate that EY would pick LFAI apart by arguing that such an AI would inevitably circumvent and corrupt any complex legal process in an extreme way. A big part of The Law is dealing with people trying to corrupt justice in their favor of course, and even after hundreds of years it’s still far from perfect.
Rather than just applying a −10 score to anything that looks illegal, I think LFAI should work more like a chess AI, where moves that are “illegal” are not even considered, and the AI only picks from legal moves. You didn’t choose that approach because The Law (especially in the US) is so complicated and overbearing that you can’t actually comply with it, so you have to see your actions as on a spectrum from illegal to legal, compare that your spectrum of utility, and choose actions in the sweet spot of legal and useful. But this is still utilitarianism, not deontology, and if you want the benefits of deontology, you should go all the way.
For example, robots should never kill people. (Let’s put aside questions of combat and law enforcement robots for the moment.) Of course, any action in the world is likely to have the risk of a fatal accident or some impact on the probability of death, and we want robots to be able to take some actions. So, a LFAI should have the rule “any action which has a 1-in-a-million (or whatever) chance of killing someone should be ruled out and not even considered.” There should be a corollary that the AI doesn’t have to a huge amount of processing to get to it’s estimate of the probability of someone dying, since a reasonable amount of certainty will do. This is not “negative 2 billion points if anyone dies”, since then you have all the problems of extremely powerful optimizers. This is “if anyone dies (with some probability), don’t even consider the action.”
When I first read the title of this post, I was assuming that you meant “law” in the sense of Asimov’s Three Laws (i.e. a set of rules intended for AI), but it seems like you mean The Law (presumably US law). These are two very different things, and don’t I don’t think it’s useful to conflate them.
It seems to me that all of the problems in alignment start with the objective function. The standard alignment-risk argument goes: “Extremely powerful optimizers optimize for things humans don’t want, because what humans want cannot be described numerically and what can be described numerically is usually/always disastrous.” I anticipate that EY would pick LFAI apart by arguing that such an AI would inevitably circumvent and corrupt any complex legal process in an extreme way. A big part of The Law is dealing with people trying to corrupt justice in their favor of course, and even after hundreds of years it’s still far from perfect.
Rather than just applying a −10 score to anything that looks illegal, I think LFAI should work more like a chess AI, where moves that are “illegal” are not even considered, and the AI only picks from legal moves. You didn’t choose that approach because The Law (especially in the US) is so complicated and overbearing that you can’t actually comply with it, so you have to see your actions as on a spectrum from illegal to legal, compare that your spectrum of utility, and choose actions in the sweet spot of legal and useful. But this is still utilitarianism, not deontology, and if you want the benefits of deontology, you should go all the way.
For example, robots should never kill people. (Let’s put aside questions of combat and law enforcement robots for the moment.) Of course, any action in the world is likely to have the risk of a fatal accident or some impact on the probability of death, and we want robots to be able to take some actions. So, a LFAI should have the rule “any action which has a 1-in-a-million (or whatever) chance of killing someone should be ruled out and not even considered.” There should be a corollary that the AI doesn’t have to a huge amount of processing to get to it’s estimate of the probability of someone dying, since a reasonable amount of certainty will do. This is not “negative 2 billion points if anyone dies”, since then you have all the problems of extremely powerful optimizers. This is “if anyone dies (with some probability), don’t even consider the action.”