Vladimir_Nesov comments on Work on Security Instead of Friendliness?

Vladimir_Nesov 2 Sep 2012 16:43 UTC
0 points

Um. Shouldn’t we be thinking “how will we get the FAI to conclude that replacing people with software is not harmless” not “If the FAI concludes that this is harmless...”

If it’s actually a FAI, you should approve of what it decides, not of what people (including yourself) currently believe. If it can’t be relied upon in this manner, it’s not (known to be) a FAI. You know whether it’s a FAI based on its design, not based on its behavior, which it won’t generally be possible to analyze (or do something about).

You shouldn’t be sure about correct answers to object level (very vaguely specified) questions like “Is replacing people with software harmless?”. A FAI should use a procedure that’s more reliable in answering such questions than you or any other human is. If it’s using such a procedure, then what it decides is a more reliable indicator of what the correct decision is than what you (or I) believe. It’s currently unclear how to produce such a procedure.

Demanding that FAI conforms to moral beliefs currently held by people is also somewhat pointless in the sense that FAI has to be able to make decisions about much more precisely specified decision problems, such that humans won’t be able to analyze those decision problems in any useful way, so there are decisions where moral beliefs currently or potentially held by humans don’t apply. If it’s built so as to be able to make such decisions, it will also be able to answer the questions about which there are currently held beliefs, as a special case. If the procedure for answering moral questions is more reliable in general, it’ll also be more reliable for such questions.

See Complex Value Systems are Required to Realize Valuable Futures for some relevant arguments.