When making the case for work on AI x-risk to other ML researchers, what should we focus on? This post suggests arguing for three core claims:
1. Due to Goodhart’s law, instrumental goals, and safety-performance trade-offs, the development of advanced AI increases the risk of human extinction non-trivially. 2. To mitigate this x-risk, we need to know how to build safe systems, know that we know how to build safe systems, and prevent people from building unsafe systems. 3. So, we should mitigate AI x-risk, as it is impactful, neglected, and challenging but tractable.
Planned opinion:
This is a nice concise case to make, but I think the bulk of the work is in splitting the first claim into subclaims: this is the part that is usually a sticking point.
Planned summary for the Alignment Newsletter:
Planned opinion: