Failure Modes of Teaching AI Safety

Why I’m writing this

I’m about to teach my AI safety course for the fourth time. As I’m now updating the syllabus for the upcoming semester, I summarize my observations on what can go wrong when teaching AI safety. These have mostly not happened during my teaching but are generally likely to happen—as more AIS courses are developed and taught around the world—and I’ve especially thought about them when preparing the course so that they don’t happen.

1. Alignment feels like a lost cause

Depending on how x-risk is presented, getting misaligned AI might appear as an inevitable future since the problem is too complex and hard, and there isn’t enough time for AI alignment research to generate robust techniques.

How to avoid: make sure to emphasize all the work on alignment and governance and how your students could also be doing some of that (if they wanted to). I’m also updating my syllabus to include the discussion about AI Pause.

2. There is no (historical/​philosophical) context

Just like with all complex ideas, situating the problem within its context can make a big difference in how it will be perceived. Talking about AI systems that all of a sudden become a threat to humanity is confusing, to say the least.

How to avoid: talk about the foundations of AI, the debate between symbolic AI and connectionism in cognitive science, and set the stage for how we got to contemporary LLMs.

3. Your audience gets (disproportionately more) excited about capabilities

This is more common with people with technical backgrounds who like to build tools and applications and are blindly or naively excited about technological progress.

How to avoid: clearly explain the conceptual parts of the problem, describe it in more industry-friendly terminology, provide a lot of examples.

4. But … what about sexist/​racist etc. algorithms?

Hinting at AI ethics being more important than this alignment story. Also, appears in the context of the Longtermism debate.

How to avoid: show how this is a form of misalignment, happening right now. AI ethics should not be opposing AI safety.

5. There are no actionable next steps

This is true for many courses. Once you’ve finished it it’s just another letter on your transcript. I think that there are good reasons for an AIS course in particular not to fall into that category.

How to avoid: connect the material to related areas of interest that the students might be more familiar with (e.g., I have many psych or econ majors). Recommend further resources, other courses they could take, potential career paths, or activism.

You can find my syllabus for Fall 2024 here.

No comments.