Rohin Shah comments on AI Alignment, Philosophical Pluralism, and the Relevance of Non-Western Philosophy

Rohin Shah Jan 1, 2021, 8:41 PM
LW: 2 AF: 2
AF
Planned summary for the Alignment Newsletter:
This post argues that AI alignment has specific philosophical tendencies: 1) connectionism, where knowledge is encoded in neural net weights rather than through symbols, 2) behaviorism, where we learn from data rather than using reasoning or planning, 3) Humean motivations for humans (i.e. modeling humans as reward maximizers), 4) viewing rationality as decision theoretic, that is, about maximizing expected utility, rather than also considering e.g. logic, argumentation, and dialectic, and 5) consequentialism. This could be a “philosophical bubble” caused by founder effects from the EA and rationality communities, as well as from the recent success and popularity of deep learning.
Instead, we should be aiming for philosophical plurality, where we explore other philosophical traditions as well. This would be useful because 1) we would likely find insights not available in Western philosophy, 2) we would be more robust to moral uncertainty, 3) it helps us get buy in from more actors, and 4) it is the “right” thing to do, to allow others to choose the values and ethical frameworks that matter to them.
For example, certain interpretations of Confucian philosophy holds that norms have intrinsic value, as opposed to the dominant approach in Western philosophy in which individual preferences have intrinsic value, while norms only have instrumental value. This may be very relevant for learning what an AI system should optimize. Similarly, Buddhist thought often talks about problems of ontological shifts.
Planned opinion:
Certainly to the extent that AI alignment requires us to “lock in” philosophical approaches, I think it is important that we consider a plurality of views for this purpose (see also <@The Argument from Philosophical Difficulty@>). I especially think this is true if our approach to alignment is to figure out “human values” and then tell an AI to maximize them. However, I’m more optimistic about other approaches to alignment; and I think they require fewer philosophical commitments, so it becomes less of an issue that the alignment community has a specific philosophical bubble. See [this comment](https://www.alignmentforum.org/posts/jS2iiDPqMvZ2tnik2/ai-alignment-philosophical-pluralism-and-the-relevance-of?commentId=zaAYniACRc29CM6sJ) for more details.
- xuan Jan 1, 2021, 11:47 PM
  LW: 4 AF: 3
  AF Parent
  Thanks for this summary. Just a few things I would change:
  1. “Deep learning” instead of “deep reinforcement learning” at the end of the 1st paragraph—this is what I meant to say, and I’ll update the original post accordingly.
  2. I’d replace “nice” with “right” in the 2nd paragraph.
  3. “certain interpretations of Confucian philosophy” instead of “Confucian philosophy”, “the dominant approach in Western philosophy” instead of “Western philosophy”—I think it’s important not to give the impression that either of these is a monolith.
  - Rohin Shah Jan 2, 2021, 1:02 AM
    LW: 4 AF: 3
    AF Parent
    Done :)