Rohin Shah comments on Clarifying “AI Alignment”

Rohin Shah 23 Aug 2019 21:48 UTC
LW: 2 AF: 1
0
AF
Fwiw having read this exchange, I think I approximately agree with Paul. Going back to the original response to my comment:
Isn’t HCH also such a multiagent system?
Yes, I shouldn’t have made a categorical statement about multiagent systems. What I should have said was that the particular multiagent system you proposed did not have a single thing it is “trying to do”, i.e. I wouldn’t say it has a single “motivation”. This allows you to say “the system is not intent-aligned”, even though you can’t say “the system is trying to do X”.
Another way of saying this is that it is an incoherent system and so the motivation abstraction / motivation-competence decomposition doesn’t make sense, but HCH is one of the few multiagent systems that is coherent. (Idk if I believe that claim, but it seems plausible.) This seems to map on to the statement:
For an incoherent system this abstraction may not make sense, and a system may be trying to do lots of things.
Also, I want to note strong agreement with this:
Of course, it also seems quite likely that AIs of the kind that will probably be built (“by default”) also fall outside of the definition-optimization framework. So adopting this framework as a way to analyze potential aligned AIs seems to amount to narrowing the space considerably.
- Wei Dai 23 Aug 2019 22:48 UTC
  LW: 2 AF: 1
  0
  AF Parent
  
  Another way of saying this is that it is an incoherent system and so the motivation abstraction / motivation-competence decomposition doesn’t make sense, but HCH is one of the few multiagent systems that is coherent.
  
  HCH can be incoherent. I think one example that came up in an earlier discussion was the top node in HCH trying to help the user by asking (due to incompetence / insufficient understanding of corrigibility) “What is a good approximation of the user’s utility function?” followed by “What action would maximize EU according to this utility function?”
  
  ETA: If this isn’t clearly incoherent, imagine that due to further incompetence, lower nodes work on subgoals in a way that conflict with each other.