Gordon Seidoh Worley comments on Resources for AI Alignment Cartography

Gordon Seidoh Worley 4 Apr 2020 21:50 UTC
7 points
I know you link/mention Rohin’s map. I think Paul or Chris Olah had put together another map at one time. How do you see your work differing from or building on what they’ve done?
- Gyrodiot 5 Apr 2020 11:25 UTC
  7 points
  Parent
  Is Paul’s map the one in Current Work in AI Alignment? I think Rohin also used it in his online-EAG 2020 presentation. For Rohin’s map, are you referring to Ben Cottier’s Clarifying some key hypotheses in AI alignment, to which Rohin made major contributions? I’ll be referring to those two in the rest of my answer.
  I want to make more explicit the relationships between the premises and outcomes included in the diagrams. The goal of my work is to make those kinds of questions easier to answer:
  - Are scenarios X and Y mutually exclusive? If they are, is the split sharp (is there a premise P which prevents X if true, and prevents Y if false)?
  - What are the premises behind the work on a specific problem? Which events or results would make this work irrelevant?
  - Does it make sense to “partially solve” problem P? Are there efforts which won’t make any difference until something specific happens?
  I find it hard to answer those questions with the diagrams, since (from my understanding) they have other goals entirely. Paul’s map shows how current research questions relate to each other, with closer elements in the tree sharing more concepts and techniques. Ben & Rohin’s map show which questions are controversial and which debates feed into others, and which very broad scenarios/agendas are relevant to them.
  You can answer the questions listed above by integrating the diagram with the post details, and following references… but it isn’t convenient. I want to make it easier to discover and engage with that knowledge.
  The main difference between my (future) work and the diagrams would be to enable the user to explore one specific scenario/research question at a time. For example, in Paul’s talk, that would mean starting from « iterated amplification » and repeatedly asking « why ? » as you go up the tree. I want the user to find out what happens if one of the premises doesn’t hold: is the work still useful? If we want to maintain the premise, what are the load-bearing sub-premises?
  I expect a lot of the structure in the diagrams will be mirrored in the end result anyway, as it should, since it’s the same knowledge. I hope to distill it in a different way.
  - Gordon Seidoh Worley 5 Apr 2020 17:59 UTC
    3 points
    Parent
    Thanks, that really helpful to understand your work better!