You think it is self evidently true that MIRI think that the dangers they warn of are the result of AIs believing themselves to infallible?
The referents in that sentence are a little difficult to navigate, but no, I’m pretty sure I am not making that claim. :-) In other words, MIRI do not think that.
What is self-evidently true is that MIRI claim a certain kind of behavior by the AI, under certain circumstances …. and all I did was come along and put a label on that claim about the AI behavior. When you put a label on something, for convenience, the label is kinda self-evidently “correct”.
I think that what you said here:
I now see that what you have written subsequently to the OP is that DLI is almost, but not quite a description of rigid behaviour as a symptom (with the added ingredient that an AI can see the mistakenness of its behaviour):-
… is basically correct.
I had a friend once who suffered from schizophrenia. She was lucid, intelligent (studying for a Ph.D. in psychology) and charming. But if she did not take her medication she became a different person (one day she went up onto the suspension bridge that was the main traffic route out of town and threatened to throw herself to her death 300 feet below. She brought the whole town to a halt for several hours, until someone talked her down.) Now, talking to her in a good moment she could tell you that she knew about her behavior in the insane times—she was completely aware of that side of herself—and she knew that in that other state she would find certain thoughts completely compelling and convincing, even though at this calm moment she could tell you that those thoughts were false. If I say that during the insane period her mind was obeying a “Doctrine That Paranoid Beliefs Are Justified”, then all I am doing is labeling that state that governed her during those times.
That label would just be a label, so if someone said “No, you’re wrong: she does not subscribe to the DTPBAJ at all”, I would be left nonplussed. All I wanted to do was label something that she told me she categorically DID believe, so how can my label be in some sense ‘wrong’?
So, that is why some people’s attacks on the DLI are a little baffling.
Their criticisms are possibly accurate about the first version., which gives a cause for the rigid behaviour “it regards its own conclusions as sacrosanct.*
Ummm...
The referents in that sentence are a little difficult to navigate, but no, I’m pretty sure I am not making that claim. :-) In other words, MIRI do not think that.
What is self-evidently true is that MIRI claim a certain kind of behavior by the AI, under certain circumstances …. and all I did was come along and put a label on that claim about the AI behavior. When you put a label on something, for convenience, the label is kinda self-evidently “correct”.
I think that what you said here:
… is basically correct.
I had a friend once who suffered from schizophrenia. She was lucid, intelligent (studying for a Ph.D. in psychology) and charming. But if she did not take her medication she became a different person (one day she went up onto the suspension bridge that was the main traffic route out of town and threatened to throw herself to her death 300 feet below. She brought the whole town to a halt for several hours, until someone talked her down.) Now, talking to her in a good moment she could tell you that she knew about her behavior in the insane times—she was completely aware of that side of herself—and she knew that in that other state she would find certain thoughts completely compelling and convincing, even though at this calm moment she could tell you that those thoughts were false. If I say that during the insane period her mind was obeying a “Doctrine That Paranoid Beliefs Are Justified”, then all I am doing is labeling that state that governed her during those times.
That label would just be a label, so if someone said “No, you’re wrong: she does not subscribe to the DTPBAJ at all”, I would be left nonplussed. All I wanted to do was label something that she told me she categorically DID believe, so how can my label be in some sense ‘wrong’?
So, that is why some people’s attacks on the DLI are a little baffling.
Their criticisms are possibly accurate about the first version., which gives a cause for the rigid behaviour “it regards its own conclusions as sacrosanct.*