Maybe I’m just jaded, but this critique doesn’t impress me much. Holden’s substantive suggestion is that, instead of trying to design friendly agent AI, we should just make passive “tool AI” that only reacts to commands but never acts on its own. So when do we start thinking about the problems peculiar to agent AI? Do we just hope that agent AI will never come into existence? Do we ask the tool AI to solve the friendly AI problem for us? (That seems to be what people want to do anyway, an approach I reject as ridiculously indirect.)
(Perhaps I should note that I find your approach to be too indirect as well: if you really understand how justification works then you should be able to use that knowledge to make (invoke?) a theoretically perfectly justified agent, who will treat others’ epistemic and moral beliefs in a thoroughly justified manner without your having to tell it “morality is in mind-brains, figure out what the mind-brains say then do what they tell you to do”. That is, I think the correct solution should be just clearly mathematically and meta-ethically justified, question-dissolving, reflective, non-arbitrary, perfect decision theory. Such an approach is closest in spirit to CFAI. All other approaches, e.g. CEV, WBE, or oracle AI, are relatively arbitrary and unmotivated, especially meta-ethically.)
Not only does this seem wrong, but if I believed it I would want SI to look for the correct decision theory (roughly what Eliezer says he’s doing anyway). It fails to stress the possibility that Eliezer’s whole approach is wrong. In fact it seems willfully (heh) ignorant of the planning fallacy and similar concerns: even formalizing the ‘correct’ prior seems tricky to me, so why would it be feasible to formalize “correct” meta-ethics even if it exists in the sense you mean? And what reason do we have to believe that a version with no pointers to brains exists at all?
At least with reflective decision theory I see no good reason to think that a transparently-written AGI is impossible in principle (our neurons don’t just fire randomly, nor does evolution seem like a particularly good searcher of mindspace), so a theory of decisions that can describe said AGI’s actions should be mathematically possible barring some alternative to math. (Whether, eg, the description would fit in our observable universe seems like another question.)
Maybe I’m just jaded, but this critique doesn’t impress me much. Holden’s substantive suggestion is that, instead of trying to design friendly agent AI, we should just make passive “tool AI” that only reacts to commands but never acts on its own. So when do we start thinking about the problems peculiar to agent AI? Do we just hope that agent AI will never come into existence? Do we ask the tool AI to solve the friendly AI problem for us? (That seems to be what people want to do anyway, an approach I reject as ridiculously indirect.)
(Perhaps I should note that I find your approach to be too indirect as well: if you really understand how justification works then you should be able to use that knowledge to make (invoke?) a theoretically perfectly justified agent, who will treat others’ epistemic and moral beliefs in a thoroughly justified manner without your having to tell it “morality is in mind-brains, figure out what the mind-brains say then do what they tell you to do”. That is, I think the correct solution should be just clearly mathematically and meta-ethically justified, question-dissolving, reflective, non-arbitrary, perfect decision theory. Such an approach is closest in spirit to CFAI. All other approaches, e.g. CEV, WBE, or oracle AI, are relatively arbitrary and unmotivated, especially meta-ethically.)
Not only does this seem wrong, but if I believed it I would want SI to look for the correct decision theory (roughly what Eliezer says he’s doing anyway). It fails to stress the possibility that Eliezer’s whole approach is wrong. In fact it seems willfully (heh) ignorant of the planning fallacy and similar concerns: even formalizing the ‘correct’ prior seems tricky to me, so why would it be feasible to formalize “correct” meta-ethics even if it exists in the sense you mean? And what reason do we have to believe that a version with no pointers to brains exists at all?
At least with reflective decision theory I see no good reason to think that a transparently-written AGI is impossible in principle (our neurons don’t just fire randomly, nor does evolution seem like a particularly good searcher of mindspace), so a theory of decisions that can describe said AGI’s actions should be mathematically possible barring some alternative to math. (Whether, eg, the description would fit in our observable universe seems like another question.)