My contribution to this whole debate is to point out that the DESIGN of the AI is incoherent, because the AI is supposed to be able to hold two logically inconsistent ideas (implicit belief in its infallibility and knowledge of its fallibility).
What does incoherent mean, here?
If it just labels the fact that it has inconsistent beliefs then it is true but unimpactuve...humans can also hold contradictory beliefs and still .be intelligent enough toebe dangerous,
If means something amounting to “impossibe to build”, then it would be highly impactive… but there is no good reason to think that that is the case,.
You’re right to point out that “incoherent” covers a multitude of sins.
I really had three main things in mind.
1) If an AI system is proposed which contains logically contradictory beliefs located in the most central, high-impact area of its system, it is reasonable to ask how such an AI can function when it allows both X and not-X to be in its knowledge base. I think I would be owed at least some variety of explanation as to why this would not cause the usual trouble when systems try to do logic in such circumstances. So I am saying “This design that you propose is incoherent because you have omitted to say how this glaring problem is supposed to be resolved”).
(Yes, I’m aware that there are workarounds for contradictory beliefs, but those ideas are usually supposed to apply to pretty obscure corners of the AI’s belief system, not to the component that is in charge of the whole shebang).
2) If an AI perceives itself to be wired in such a way that it is compelled to act as if it was infallible, while at the same time knowing that it is both fallible AND perpetrating acts that are directly caused by its failings (for all the aforementioned reasons that we don’t need to re-argue), then I would suggest that such an AI would do something about this situation. The AI, after all, is supposed to be “superintelligent”, so why would it not take steps to stop this immensely damaging situation from occurring?
So in this case I am saying: “This hypothetical superintelligence has an extreme degree of knowledge about its own design, but it is tolerating a massive and damaging contradiction in its construction without doing anything to resolve the problem: it is incoherent to suggest that such a situation could arise without explaining why the AI tolerates the contradiction and fails to act”
(Aside: you mention that humans can hold contradictory beliefs and still be intelligent enough to be dangerous. Arguing from the human case would not be valid because in other areas of this debate I have been told repeatedly not to accidentally generalize and “assume” that the AI would do something just because humans do something. Now, I actually don’t commit the breaches I am charged with (I claim!) (and that is an argument for another day), but I consider the problem of accidental anthropomorphism to be real, so we should not do that here).
3) Lastly, I can point to the fact that IF the hypothetical AI can engage in this kind of bizarre situation where it compulsively commits action X, while knowing that its knowledge of the world indicates that the consequences will strongly violate the goals that were supposed to justify X, THEN I am owed an explanation for why this type of event does not occur more often. Why is it that the AI does this only when it encounters a goal such as “make humans happy”, and not in a million other goals? Why are there not bizarre plans (which are massively inconsistent with the source goal) all the time?
So in this case I would say: “It is incoherent to suggest an AI design in which a drastic inconsistency of this sort occurs in the case of the “maximize human happiness” goal, ut where it doesn’t occur all over the AI’s behavior. In particular I am owed an explanation for why this particular AI is clever enough to be a threat, since it might be expected to have been doing this sort of thing throughout its development, and in that case I would expect it to be so stupid that it would never have made it to super intelligence in the first place.”
Those are the three main areas in which the design would be incoherent ….. i.e. would have such glaring, inbelievable gaps in the design that those gaps would need to be explained before the hypothetical AI could become at all believable.
What does incoherent mean, here?
If it just labels the fact that it has inconsistent beliefs then it is true but unimpactuve...humans can also hold contradictory beliefs and still .be intelligent enough toebe dangerous,
If means something amounting to “impossibe to build”, then it would be highly impactive… but there is no good reason to think that that is the case,.
You’re right to point out that “incoherent” covers a multitude of sins.
I really had three main things in mind.
1) If an AI system is proposed which contains logically contradictory beliefs located in the most central, high-impact area of its system, it is reasonable to ask how such an AI can function when it allows both X and not-X to be in its knowledge base. I think I would be owed at least some variety of explanation as to why this would not cause the usual trouble when systems try to do logic in such circumstances. So I am saying “This design that you propose is incoherent because you have omitted to say how this glaring problem is supposed to be resolved”).
(Yes, I’m aware that there are workarounds for contradictory beliefs, but those ideas are usually supposed to apply to pretty obscure corners of the AI’s belief system, not to the component that is in charge of the whole shebang).
2) If an AI perceives itself to be wired in such a way that it is compelled to act as if it was infallible, while at the same time knowing that it is both fallible AND perpetrating acts that are directly caused by its failings (for all the aforementioned reasons that we don’t need to re-argue), then I would suggest that such an AI would do something about this situation. The AI, after all, is supposed to be “superintelligent”, so why would it not take steps to stop this immensely damaging situation from occurring?
So in this case I am saying: “This hypothetical superintelligence has an extreme degree of knowledge about its own design, but it is tolerating a massive and damaging contradiction in its construction without doing anything to resolve the problem: it is incoherent to suggest that such a situation could arise without explaining why the AI tolerates the contradiction and fails to act”
(Aside: you mention that humans can hold contradictory beliefs and still be intelligent enough to be dangerous. Arguing from the human case would not be valid because in other areas of this debate I have been told repeatedly not to accidentally generalize and “assume” that the AI would do something just because humans do something. Now, I actually don’t commit the breaches I am charged with (I claim!) (and that is an argument for another day), but I consider the problem of accidental anthropomorphism to be real, so we should not do that here).
3) Lastly, I can point to the fact that IF the hypothetical AI can engage in this kind of bizarre situation where it compulsively commits action X, while knowing that its knowledge of the world indicates that the consequences will strongly violate the goals that were supposed to justify X, THEN I am owed an explanation for why this type of event does not occur more often. Why is it that the AI does this only when it encounters a goal such as “make humans happy”, and not in a million other goals? Why are there not bizarre plans (which are massively inconsistent with the source goal) all the time?
So in this case I would say: “It is incoherent to suggest an AI design in which a drastic inconsistency of this sort occurs in the case of the “maximize human happiness” goal, ut where it doesn’t occur all over the AI’s behavior. In particular I am owed an explanation for why this particular AI is clever enough to be a threat, since it might be expected to have been doing this sort of thing throughout its development, and in that case I would expect it to be so stupid that it would never have made it to super intelligence in the first place.”
Those are the three main areas in which the design would be incoherent ….. i.e. would have such glaring, inbelievable gaps in the design that those gaps would need to be explained before the hypothetical AI could become at all believable.