(opinions are my own) I think this is a good review. Some points that resonated with me: 1. “The concepts of systemic safety, monitoring, robustness, and alignment seem rather fuzzy.” I don’t think the difference between objective and capabilities robustness is discussed but this distinction seems important. Also, I agree that Truthful AI could easily go into monitoring. 2. “Lack of concrete threat models.” At the beginning of the course, there are a few broad arguments for why AI might be dangerous but not a lot of concrete failure modes. Adding more failure modes here would better motivate the material.
3. Give more clarity on how the various ML safety techniques address the alignment problem, and how they can potentially scale to solve bigger problems of a similar nature as AIs scale in capabilities 4. Give an assessment on the most pressing issues that should be addressed by the ML community and the potential work that can be done to contribute to the ML safety field
You can read more about how these technical problems relate to AGI failure modes and how they rank on importance, tractability, and crowdedness in Pragmatic AI Safety 5. I think the creators included this content in a separate forum post for a reason.
The course is intended for to two audiences: people who are already worried about AI X-risk and people who are only interested in the technical content. The second group doesn’t necessarily care about why each research direction relates to reducing X-risk.
Putting a lot of emphasis on this might just turn them off. It could give them the impression that you have to buy X-risk arguments in order to work on these problems (which I don’t think is true) or it could make them less likely to recommend the course to others, causing fewer people to engage with the X-risk material overall.
You can read more about how these technical problems relate to AGI failure modes and how they rank on importance, tractability, and crowdedness in Pragmatic AI Safety 5. I think the creators included this content in a separate forum post for a reason.
I felt some of the content in the PAIS series would’ve been great for the course, though the creators probably had a reason to exclude them and I’m not sure why.
The second group doesn’t necessarily care about why each research direction relates to reducing X-risk.
In this case I feel it could be better for the chapter on x-risk to be removed entirely. Might be better to not include it at all than to include it and mostly show quotes by famous people without properly engaging in the arguments.
(opinions are my own)
I think this is a good review. Some points that resonated with me:
1. “The concepts of systemic safety, monitoring, robustness, and alignment seem rather fuzzy.” I don’t think the difference between objective and capabilities robustness is discussed but this distinction seems important. Also, I agree that Truthful AI could easily go into monitoring.
2. “Lack of concrete threat models.” At the beginning of the course, there are a few broad arguments for why AI might be dangerous but not a lot of concrete failure modes. Adding more failure modes here would better motivate the material.
You can read more about how these technical problems relate to AGI failure modes and how they rank on importance, tractability, and crowdedness in Pragmatic AI Safety 5. I think the creators included this content in a separate forum post for a reason.
The course is intended for to two audiences: people who are already worried about AI X-risk and people who are only interested in the technical content. The second group doesn’t necessarily care about why each research direction relates to reducing X-risk.
Putting a lot of emphasis on this might just turn them off. It could give them the impression that you have to buy X-risk arguments in order to work on these problems (which I don’t think is true) or it could make them less likely to recommend the course to others, causing fewer people to engage with the X-risk material overall.
Thanks for the comment!
I felt some of the content in the PAIS series would’ve been great for the course, though the creators probably had a reason to exclude them and I’m not sure why.
In this case I feel it could be better for the chapter on x-risk to be removed entirely. Might be better to not include it at all than to include it and mostly show quotes by famous people without properly engaging in the arguments.
From what I understand, Dan plans to add more object-level arguments soon.