I’m much more sympathetic to “Never try to deceive yourself, or offer a reason to believe other than probable truth”. Honestly, it seems to me that I take this injunction as seriously as anyone does, including Eliezer, but I’m still, unlike Eliezer, willing to mention a few caveats. The most important is that for humans, though not for minds in general, beliefs, brain states, world states, and values are not cleanly separate. There is not, for instance, any completely clean distinction between causing myself to hold a vague belief about what it would feel like to cut my hand off which doesn’t tightly concentrate probability mass and which not coincidentally is not directly dis-valued by my utility function and not cutting my hand off. Another more controversial claim, though not very controversial I hope, is that I should not read a computer monitor controlled by an unfriendly superintelligent AI which has injunctions against deceiving me. I might want to temporarily deceive myself, using others as the agents of my self-deception, as part of a social psychology experiment to test my probable behavior in certain situations. Really, given some effort it’s not hard to come up with exceptions to even so reliable a rule as this one. For such a reason, I would be very wary of using such rules in an AGI, but of course, perhaps the actual mathematical formulation of the rule in question within the AGI would be less problematic, though a few seconds of thought doesn’t give me much reason to think this.
In a very general sense though, I see a logical problem with this whole line of thought. How can any of these injunctions survive except as self-protecting beliefs? Isn’t this whole approach just the sort of “fighting bias with bias” that you and Robin usually argue against?
In a very general sense though, I see a logical problem with this whole line of thought. How can any of these injunctions survive except as self-protecting beliefs?
How can utility functions (or terms in utility functions, depending on how you want to splice it up) survive except as self-protecting beliefs? The strange loop through the meta-level is not like induction where you have no other choice, there are many possible utility functions.
(I’m making this comment as a note to self to flag Michael’s comment for future reference.)
I’m much more sympathetic to “Never try to deceive yourself, or offer a reason to believe other than probable truth”. Honestly, it seems to me that I take this injunction as seriously as anyone does, including Eliezer, but I’m still, unlike Eliezer, willing to mention a few caveats. The most important is that for humans, though not for minds in general, beliefs, brain states, world states, and values are not cleanly separate. There is not, for instance, any completely clean distinction between causing myself to hold a vague belief about what it would feel like to cut my hand off which doesn’t tightly concentrate probability mass and which not coincidentally is not directly dis-valued by my utility function and not cutting my hand off. Another more controversial claim, though not very controversial I hope, is that I should not read a computer monitor controlled by an unfriendly superintelligent AI which has injunctions against deceiving me. I might want to temporarily deceive myself, using others as the agents of my self-deception, as part of a social psychology experiment to test my probable behavior in certain situations. Really, given some effort it’s not hard to come up with exceptions to even so reliable a rule as this one. For such a reason, I would be very wary of using such rules in an AGI, but of course, perhaps the actual mathematical formulation of the rule in question within the AGI would be less problematic, though a few seconds of thought doesn’t give me much reason to think this.
In a very general sense though, I see a logical problem with this whole line of thought. How can any of these injunctions survive except as self-protecting beliefs? Isn’t this whole approach just the sort of “fighting bias with bias” that you and Robin usually argue against?
How can utility functions (or terms in utility functions, depending on how you want to splice it up) survive except as self-protecting beliefs? The strange loop through the meta-level is not like induction where you have no other choice, there are many possible utility functions.
(I’m making this comment as a note to self to flag Michael’s comment for future reference.)