Wei Dai comments on [Link] Why I’m optimistic about OpenAI’s alignment approach

Wei Dai 6 Dec 2022 14:18 UTC
LW: 23 AF: 12
6
AF
I guess it depends on the specific alignment approach being taken, such as whether you’re trying to build a sovereign or an assistant. Assuming the latter, I’ll list some philosophical problems that seem generally relevant:
1. metaphilosophy
  - How to solve new philosophical problems relevant to alignment as they come up?
  - How to help users when they ask the AI to attempt philosophical progress?
  - How to help defend the user against bad philosophical ideas (whether in the form of virulent memes, or intentionally optimized by other AIs/agents to manipulate the user)?
  - How to enhance or at least not disrupt our collective ability to make philosophical progress?
2. metaethics
  - Should the AI always defer to the user or to OpenAI on ethical questions?
  - If not or if the user asks the AI to, how can it / should it try to make ethical determinations?
3. rationality
  - How should the AI try to improve its own thinking?
  - How to help the user be more rational (if they so request)?
4. normativity
  - How should the AI reason about “should” problems in general?
5. normative and applied ethics
  - What kinds of user requests should the AI refuse to fulfill?
  - What does it mean to help the user when their goals/values are confused or unclear?
  - When is it ok to let OpenAI’s interests override the user’s?
6. philosophy of mind
  - Which computations are conscious or constitute moral patients?
  - What exactly constitute pain or suffering (and therefore the AI should perhaps avoid helping the user create)?
  - How to avoid “mind crimes” within the AI’s own cognition/computation?
7. decision theory / game theory / bargaining
  - How to help the user bargain with other agents?
  - How to avoid (and help the user avoid) being exploited by others (including distant superintelligences)?
See also this list which I wrote a while ago. I wrote the above without first reviewing that post (to try to generate a new perspective).
What links here?
- Philosophical Cyborg (Part 1) by ukc10014 (14 Jun 2023 16:20 UTC; 31 points)