David Scott Krueger (formerly: capybaralet) comments on Clarifying some key hypotheses in AI alignment

David Scott Krueger (formerly: capybaralet) 2 Dec 2019 20:52 UTC
LW: 6 AF: 4
AF
Nice chart!
A few questions and comments:
- Why the arrow from “agentive AI” to “humans are economically outcompeted”? The explanation makes it sounds like it should point to “target loading fails”??
- Suggestion: make the blue boxes without parents more apparent? e.g. a different shade of blue? Or all sitting above the other ones? (e.g. “broad basin of corrigibility” could be moved up and left).
- Ben Cottier 21 Dec 2019 15:13 UTC
  1 point
  Parent
  Thanks! Comments are much appreciated.
  Why the arrow from “agentive AI” to “humans are economically outcompeted”? The explanation makes it sounds like it should point to “target loading fails”??
  It’s been a few months and I didn’t write in detail why that arrow is there, so I can’t be certain of the original reason. My understanding now: humans getting economically outcompeted means AI systems are competing with humans, and therefore optimising against humans on some level. Goal-directedness enables/worsens this.
  Looking back at the linked explanation of the target loading problem, I understand it as more “at the source”: coming up with a procedure that makes AI actually behave as intended. As Richard said there, one can think of it as a more general version of the inner-optimiser (mesa-optimiser) problem. This is why e.g. there’s an arrow from “incidental agentive AGI” to “target loading fails”. Pointing this arrow to it might make sense, but to me the connection isn’t strong enough to be within the “clutter budget” of the diagram.
  Suggestion: make the blue boxes without parents more apparent? e.g. a different shade of blue? Or all sitting above the other ones? (e.g. “broad basin of corrigibility” could be moved up and left).
  Changing the design of those boxes sounds good. I don’t want to move them because the arrows would get more cluttered.