I guess it depends on the specific alignment approach being taken, such as whether you’re trying to build a sovereign or an assistant. Assuming the latter, I’ll list some philosophical problems that seem generally relevant:
metaphilosophy
How to solve new philosophical problems relevant to alignment as they come up?
How to help users when they ask the AI to attempt philosophical progress?
How to help defend the user against bad philosophical ideas (whether in the form of virulent memes, or intentionally optimized by other AIs/agents to manipulate the user)?
How to enhance or at least not disrupt our collective ability to make philosophical progress?
metaethics
Should the AI always defer to the user or to OpenAI on ethical questions?
If not or if the user asks the AI to, how can it / should it try to make ethical determinations?
rationality
How should the AI try to improve its own thinking?
How to help the user be more rational (if they so request)?
normativity
How should the AI reason about “should” problems in general?
normative and applied ethics
What kinds of user requests should the AI refuse to fulfill?
What does it mean to help the user when their goals/values are confused or unclear?
When is it ok to let OpenAI’s interests override the user’s?
philosophy of mind
Which computations are conscious or constitute moral patients?
What exactly constitute pain or suffering (and therefore the AI should perhaps avoid helping the user create)?
How to avoid “mind crimes” within the AI’s own cognition/computation?
decision theory / game theory / bargaining
How to help the user bargain with other agents?
How to avoid (and help the user avoid) being exploited by others (including distant superintelligences)?
See also this list which I wrote a while ago. I wrote the above without first reviewing that post (to try to generate a new perspective).
I guess it depends on the specific alignment approach being taken, such as whether you’re trying to build a sovereign or an assistant. Assuming the latter, I’ll list some philosophical problems that seem generally relevant:
metaphilosophy
How to solve new philosophical problems relevant to alignment as they come up?
How to help users when they ask the AI to attempt philosophical progress?
How to help defend the user against bad philosophical ideas (whether in the form of virulent memes, or intentionally optimized by other AIs/agents to manipulate the user)?
How to enhance or at least not disrupt our collective ability to make philosophical progress?
metaethics
Should the AI always defer to the user or to OpenAI on ethical questions?
If not or if the user asks the AI to, how can it / should it try to make ethical determinations?
rationality
How should the AI try to improve its own thinking?
How to help the user be more rational (if they so request)?
normativity
How should the AI reason about “should” problems in general?
normative and applied ethics
What kinds of user requests should the AI refuse to fulfill?
What does it mean to help the user when their goals/values are confused or unclear?
When is it ok to let OpenAI’s interests override the user’s?
philosophy of mind
Which computations are conscious or constitute moral patients?
What exactly constitute pain or suffering (and therefore the AI should perhaps avoid helping the user create)?
How to avoid “mind crimes” within the AI’s own cognition/computation?
decision theory / game theory / bargaining
How to help the user bargain with other agents?
How to avoid (and help the user avoid) being exploited by others (including distant superintelligences)?
See also this list which I wrote a while ago. I wrote the above without first reviewing that post (to try to generate a new perspective).