After looking into the prototype course, I updated upwards on this project, as I think it is a decent introduction to Dylan’s Off-Switch Game paper. Could I ask what other stuff RAISE wants to cover in the course? What other work on corrigibility are you planning to cover? (For example Dylan’s other work, MIRI’s work on this subject and Smitha Mili’s paper?)
Could you also write more about who your course is targeting? Why does RAISE believe that the best way to fix the talent gap in AI safety is to help EAs change careers via introductory AI Safety material, instead of, say, making it easier for CS PhD students to do research on AI Safety-relevant topics? Why do we need to build a campus, instead of co-opting the existing education mechanisms of academia?
Finally, could you link some of the mind maps and summaries RAISE has created?
After looking into the prototype course, I updated upwards on this project, as I think it is a decent introduction to Dylan’s Off-Switch Game paper. Could I ask what other stuff RAISE wants to cover in the course? What other work on corrigibility are you planning to cover? (For example Dylan’s other work, MIRI’s work on this subject and Smitha Mili’s paper?)
Thank you!
Expecting to know better after getting our hands dirty, we decided to take it one subfield at the time. We haven’t decided which subfield to cover beyond Corrigibility. Though a natural choice seems to be Value Learning.
We have identified 9 papers within/adjacent to Corrigibility:
Could you also write more about who your course is targeting? Why does RAISE believe that the best way to fix the talent gap in AI safety is to help EAs change careers via introductory AI Safety material, instead of, say, making it easier for CS PhD students to do research on AI Safety-relevant topics? Why do we need to build a campus, instead of co-opting the existing education mechanisms of academia?
To do our views justice requires a writeup of it’s own, but I can give a stub. This doesn’t necessarily represent the official view of RAISE, because that view doesn’t exist yet, but let me just try to grasp at my intuition here:
First of all, I think both approaches are valid. There are people entrenched in academia who should be given the means to do good work. But there are also people outside of academia that could be given the means to do even better work.
Here’s just a first stab at ways in which academia is inadequate:
Academia lacks (various kinds of) customization. For example, each bachelor’s degree takes 3 years. It’s self-evident that not every field has an equal amount of knowledge to teach. Students don’t get a lot of personal attention. They’re exposed to all kinds of petty punishment schemes and I have found that this demeaning treatment can be strongly off-putting for some people.
Student culture allows for strongly negligent work habits. I only have data from the Netherlands, but I would guess that students put in about 15 hours per week on average. That’s great if you want people to develop and self-actualize, but we would rather they study AIS instead. In my opinion, a standard of 60+ hours per week would be more fitting given the stakes we’re dealing with.
There is barely any room for good distillation. I’m sure academics would like to take the time to craft better explanations, but they have to sacrifice it to keep up with the rat race of flashy publications. Most of them will just stick with the standard textbooks, and copy whatever slides they have laying around.
Partly because of all this, it takes at least 5 years from high school graduation to finally start doing useful work. My intuitive guess is that it shouldn’t take much more than 2 years if people Actually Tried. Especially for those that are highly talented.
But hey, these are problems anyone could be having, right? Now the real problem isn’t any of these specific bugs. The real problem is that academia is an old bureaucratic institution with all kinds of entrenched interests, and patching it is hard. Even if you jump through all the hoops and do the politics and convince some people, you will hardly gain any traction. Baseline isn’t so bad, but we could do so much better.
The real problem that I have with academia isn’t necessarily it’s current form. It’s the amount of optimization power you need to upgrade it.
Finally, could you link some of the mind maps and summaries RAISE has created?
Sure! Here’s the work we’ve done for Corrigibility. I haven’t read all of it, so I do not necessarily endorse the quality of every piece. If you’d like to look at the script we used for the first lesson, go to “script drafts” and have a look at script F.
After looking into the prototype course, I updated upwards on this project, as I think it is a decent introduction to Dylan’s Off-Switch Game paper. Could I ask what other stuff RAISE wants to cover in the course? What other work on corrigibility are you planning to cover? (For example Dylan’s other work, MIRI’s work on this subject and Smitha Mili’s paper?)
Could you also write more about who your course is targeting? Why does RAISE believe that the best way to fix the talent gap in AI safety is to help EAs change careers via introductory AI Safety material, instead of, say, making it easier for CS PhD students to do research on AI Safety-relevant topics? Why do we need to build a campus, instead of co-opting the existing education mechanisms of academia?
Finally, could you link some of the mind maps and summaries RAISE has created?
Thank you!
Expecting to know better after getting our hands dirty, we decided to take it one subfield at the time. We haven’t decided which subfield to cover beyond Corrigibility. Though a natural choice seems to be Value Learning.
We have identified 9 papers within/adjacent to Corrigibility:
1) Russell & LaVictoire—“Corrigibility in AI systems” (2015)
https://intelligence.org/files/CorrigibilityAISystems.pdf
2) Omohundro—“The basic AI drives” (2008)
https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf
3) Soares et al. - “Corrigibility” (2015)
https://intelligence.org/files/Corrigibility.pdf
4) Hadfield-Menell et al. - “The Off-Switch Game” (2016)
https://arxiv.org/pdf/1611.08219
5) Orseau & Armstrong—“Safely Interruptible Agents” (2016)
https://intelligence.org/files/Interruptibility.pdf
6) Milli et al. - “Should robots be obedient?” (2017)
https://arxiv.org/pdf/1705.09990
7) Carey—“Incorrigibility in the CIRL framework” (2017)
https://arxiv.org/pdf/1709.06275
8) El Mhamdi et al. - “Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning” (2017)
https://arxiv.org/pdf/1704.02882
9) Armstrong—“‘Indifference’ methods for managing agent rewards” (2018)
https://arxiv.org/pdf/1712.06365
To do our views justice requires a writeup of it’s own, but I can give a stub. This doesn’t necessarily represent the official view of RAISE, because that view doesn’t exist yet, but let me just try to grasp at my intuition here:
First of all, I think both approaches are valid. There are people entrenched in academia who should be given the means to do good work. But there are also people outside of academia that could be given the means to do even better work.
Here’s just a first stab at ways in which academia is inadequate:
Academia lacks (various kinds of) customization. For example, each bachelor’s degree takes 3 years. It’s self-evident that not every field has an equal amount of knowledge to teach. Students don’t get a lot of personal attention. They’re exposed to all kinds of petty punishment schemes and I have found that this demeaning treatment can be strongly off-putting for some people.
Student culture allows for strongly negligent work habits. I only have data from the Netherlands, but I would guess that students put in about 15 hours per week on average. That’s great if you want people to develop and self-actualize, but we would rather they study AIS instead. In my opinion, a standard of 60+ hours per week would be more fitting given the stakes we’re dealing with.
There is barely any room for good distillation. I’m sure academics would like to take the time to craft better explanations, but they have to sacrifice it to keep up with the rat race of flashy publications. Most of them will just stick with the standard textbooks, and copy whatever slides they have laying around.
Partly because of all this, it takes at least 5 years from high school graduation to finally start doing useful work. My intuitive guess is that it shouldn’t take much more than 2 years if people Actually Tried. Especially for those that are highly talented.
But hey, these are problems anyone could be having, right? Now the real problem isn’t any of these specific bugs. The real problem is that academia is an old bureaucratic institution with all kinds of entrenched interests, and patching it is hard. Even if you jump through all the hoops and do the politics and convince some people, you will hardly gain any traction. Baseline isn’t so bad, but we could do so much better.
The real problem that I have with academia isn’t necessarily it’s current form. It’s the amount of optimization power you need to upgrade it.
Sure! Here’s the work we’ve done for Corrigibility. I haven’t read all of it, so I do not necessarily endorse the quality of every piece. If you’d like to look at the script we used for the first lesson, go to “script drafts” and have a look at script F.