TCB comments on Human errors, human values

TCB 8 Apr 2011 14:55 UTC
7 points
Perhaps I am missing something here, but I don’t see why utilitarianism is necessarily superior to rule-based ethics. An obvious advantage of a rule-based moral system is the speed of computation. Situations like the trolley problem require extremely fast decision-making. Considering how many problems local optima cause in machine learning and optimization, I imagine that it would be difficult for an AI to assess every possible alternative and pick the one which maximized overall utility in time to make such a decision. Certainly, we as humans frequently miss obvious alternatives when making decisions, especially when we are upset, as most humans would be if they saw a trolley about to crash into five people. Thus, having a rule-based moral system would allow us to easily make split-second decisions when such decisions were required.

Of course, we would not want to rely on a simple rule-based moral system all the time, and there are obvious benefits to utilitarianism when time is available for careful deliberation. It seems that it would be advantageous to switch back and forth between these two systems based on the time available for computation.

If the rules in a rule-based ethical system were derived from utilitarian concerns, and were chosen to maximize the expected utility over all situations to which the rule might be applied, would it not make sense to use such a rule-based system for very important, split-second decisions?
- PhilGoetz 9 Apr 2011 3:24 UTC
  4 points
  Parent
  Yes, rule-based systems might respond faster, and that is sometimes preferable.
  
  Let me back up. I categorize ethical systems into different levels of meta. “Personal ethics” are the ethical system an individual agent follows. Efficiency, and the agent’s limited knowledge, intelligence, and perspective, are big factors.
  
  “Social ethics” are the ethics a society agrees on. AFAIK, all existing ethical theorizing supposes that these are the same, and that an agent’s ethics, and its society’s ethics, must be the same thing. This makes no sense; casual observation shows this is not the case. People have ethical codes, and they are seldom the same as the ethical codes that society tells them they should have. There are obvious evolutionary reasons for this. Social ethics and personal ethics are often at cross-purposes. Social ethics are inherently dishonest, because the most effective way of maximizing social utility is often to deceive people, We expect, for instance, that telling people there is a distinction between personal ethics and social ethics should be against every social ethics in existence.
  
  (I don’t mean that social ethics are necessarily exploiting people. Even if you sincerely want the best outcome for people, and they have personal ethics such that you don’t need to deceive them into cooperating, many will be too stupid or in too much of a hurry to get good results if given full knowledge of the values that the designers of the social ethics were trying to optimize. Evolution may be the designer.)
  
  “Meta-ethics” is honest social ethics—trying to figure out what we should maximize, in a way that is not meant for public consumption—you’re not going to write your conclusions on stone tablets and give them to the masses, who wouldn’t understand them anyway. When Eliezer talks about designing Friendly AI, that’s meta-ethics (I hope). And that’s what I’m referring to here when I talk about encoding human values into an AI.
  
  Roughly, meta-ethics is “correct and thorough” ethics, where we want to know the truth and get the “right” answer (if there is one) about what to optimize.
  
  Social ethics and agent are likely to be rule-based, and that may be appropriate. Meta-ethics is an abstract thing, carried out, eg., by philosophers in journal articles; and speed of computation is typically not an issue.
  
  Any rule-based system can be transformed into a utilitarian system, but not vice-versa. Any system that can produce a choice between any two outcomes or actions imposes a complete ordering on all possible outcomes or actions, and is therefore utilitarian.
  What links here?