Mestroyer comments on Deception detection machines

Mestroyer 6 Sep 2014 7:28 UTC
1 point
I would need a bunch of guarantees about the actual mechanics of how the AI was forced to answer before I stopped seeing vague classes of ways this could go wrong. And even then, I’d assume there were some I’d missed, and if the AI has a way to show me anything other than “yes” or “no”, or I can’t prevent myself from thinking about long sequences of bits instead of just single bits separately, I’d be afraid it could manipulate me.

An example of a vague class of ways this could go wrong is if the AI figures out what my CEV would want using CDT, and itself uses a more advanced decision theory to exploit the CEV computation into wanting to write something more favorable to the AI’s utility function in the file.

Also, IIRC, Eliezer Yudkowsky said there are problems with CEV itself. (Maybe he just meant problems with the many-people version, but probably not). It was only supposed to be a vague outline, and a “see, you don’t have to spend all this time worrying about whether we share your ethical/political philosophy. Because It’s not going to be hardcoded into the AI anyway”