But one could argue that I ought not run such a seed AI in the first place until my confidence in its reliability was so high that even updating on that evidence would not be enough to make me distrust the target AI. (Certainly, I think EY would argue that.)
It seems analogous to the question of when I should doubt my own senses. There is some theoretical sense in which I should never do that: since the vast majority of my beliefs about the world are derived from my senses, it follows that when my beliefs contradict my senses I should trust my senses and doubt my beliefs. And in practice, that seems like the right thing to do most of the time.
But there are situations where the proper response to a perception is to doubt that its referent exists… to think “Yes, I’m seeing X, but no, X probably is not actually there to be seen.” They are rare, but recognizing them when they occur is important. (I’ve encountered this seriously only once in my life, shortly after my stroke, and successfully doubting it was… challenging.)
Similarly, there are situations where the proper response to a moral judgment is to doubt the moral intuitions on which it is based… to think “Yes, I’m horrified by X, but no, X probably is not actually horrible.”
Agreed, but if you do have very high confidence that you’ve made the AI reliable, and also a fairly reasoned view of your own utility function, I think you should be able to predict in advance with reasonable confidence that you won’t find yourself horrified by whatever it does. And I predict that if an AI subsumed intelligent aliens and subjected them to something they considered a terrible fate, I would be horrified.
(I’ve encountered this seriously only once in my life, shortly after my stroke, and successfully doubting it was… challenging.)
Please elaborate! It sounds interesting and it would be useful to hear how you were able to identify such a situation and successfully doubt your senses.
I’m not prepared to tell that story in its entirety here, though I appreciate your interest.
The short form is that I suffered significant brain damage and was intermittently delerious for the better part of a week, in the course of which I experienced both sensory hallucinations and a variety of cognitive failures.
The most striking of these had a fairly standard “call to prophecy” narrative, with the usual overtones of Great Significance and Presence and etc.
Doubting it mostly just boiled down to asking the question “Is it more likely that my experiences are isomorphic to external events, or that they aren’t?” The answer to that question wasn’t particularly ambiguous, under the circumstances.
The hard part was honestly asking that question, and being willing to focus on it carefully enough to arrive at an answer when my brain was running on square wheels, and being willing to accept the answer when it required rejecting some emotionally potent experiences.
Sure.
But one could argue that I ought not run such a seed AI in the first place until my confidence in its reliability was so high that even updating on that evidence would not be enough to make me distrust the target AI. (Certainly, I think EY would argue that.)
It seems analogous to the question of when I should doubt my own senses. There is some theoretical sense in which I should never do that: since the vast majority of my beliefs about the world are derived from my senses, it follows that when my beliefs contradict my senses I should trust my senses and doubt my beliefs. And in practice, that seems like the right thing to do most of the time.
But there are situations where the proper response to a perception is to doubt that its referent exists… to think “Yes, I’m seeing X, but no, X probably is not actually there to be seen.” They are rare, but recognizing them when they occur is important. (I’ve encountered this seriously only once in my life, shortly after my stroke, and successfully doubting it was… challenging.)
Similarly, there are situations where the proper response to a moral judgment is to doubt the moral intuitions on which it is based… to think “Yes, I’m horrified by X, but no, X probably is not actually horrible.”
Agreed, but if you do have very high confidence that you’ve made the AI reliable, and also a fairly reasoned view of your own utility function, I think you should be able to predict in advance with reasonable confidence that you won’t find yourself horrified by whatever it does. And I predict that if an AI subsumed intelligent aliens and subjected them to something they considered a terrible fate, I would be horrified.
Please elaborate! It sounds interesting and it would be useful to hear how you were able to identify such a situation and successfully doubt your senses.
I’m not prepared to tell that story in its entirety here, though I appreciate your interest.
The short form is that I suffered significant brain damage and was intermittently delerious for the better part of a week, in the course of which I experienced both sensory hallucinations and a variety of cognitive failures.
The most striking of these had a fairly standard “call to prophecy” narrative, with the usual overtones of Great Significance and Presence and etc.
Doubting it mostly just boiled down to asking the question “Is it more likely that my experiences are isomorphic to external events, or that they aren’t?” The answer to that question wasn’t particularly ambiguous, under the circumstances.
The hard part was honestly asking that question, and being willing to focus on it carefully enough to arrive at an answer when my brain was running on square wheels, and being willing to accept the answer when it required rejecting some emotionally potent experiences.