Daniel Kokotajlo comments on An Analytic Perspective on AI Alignment

Daniel Kokotajlo 1 Mar 2020 15:49 UTC
LW: 7 AF: 4
AF
This is great, thanks! A whole new take on what our goal is!
It’s especially exciting to me because occasionally in conversation I’ve said things like “OK so if we do more decision theory research, we can find and understand cases in which having the wrong decision theory can get us killed, and then that in turn can guide AI research—people can keep those cases in mind when designing test suites and stuff” and people have been like “Nah, that isn’t realistic, no one is going to listen to you talk about hypothetical failure modes” or something. (My memory is fuzzy) Now that you’ve written this post, I have a clearer sense of the background path-to-impact that I must have been having in mind when saying things like that, and also a clearer sense of what the objections are.
On that note, would you agree that the example I sketched above is an example of the sort of thing that fits in your project? Or is finding decision-theoretic problem cases not part of agent foundations, or not relevantly similar enough in your mind?
- DanielFilan 2 Mar 2020 19:44 UTC
  LW: 4 AF: 2
  AF Parent
  
  would you agree that the example I sketched above is an example of the sort of thing that fits in your project?
  
  It seems pretty closely linked to the agent foundations side of this perspective, but I’d say “my project” for the duration of my PhD is on the transparency side.
- DanielFilan 2 Mar 2020 19:45 UTC
  LW: 2 AF: 1
  AF Parent
  Also, it’s gratifying to hear this post was useful to someone other than me :)