Wei Dai comments on Clarifying “AI Alignment”

Wei Dai 23 Aug 2019 22:48 UTC
LW: 2 AF: 1
AF

Another way of saying this is that it is an incoherent system and so the motivation abstraction / motivation-competence decomposition doesn’t make sense, but HCH is one of the few multiagent systems that is coherent.

HCH can be incoherent. I think one example that came up in an earlier discussion was the top node in HCH trying to help the user by asking (due to incompetence / insufficient understanding of corrigibility) “What is a good approximation of the user’s utility function?” followed by “What action would maximize EU according to this utility function?”

ETA: If this isn’t clearly incoherent, imagine that due to further incompetence, lower nodes work on subgoals in a way that conflict with each other.