Loosemore, Yudkowsky, and myself are all discussing AIs that have a goal misaligned with human values that they nevertheless find motivating.
If that is supposed to be a universal or generic AI, it is a valid criticiYsm to point out that not all AIs are like that.
If that is supposed to be a particular kind of AI, it is a valid criticism to point out that no realistic AIs are like that.
You seem to feel you are not being understood, but what is being said is not clear,
1 Whether or not “superintelligent” is a meaningful term in this context
“Superintelligence” is one of the clearer terms here, IMO. It just means more than human intelligence, and humans can notice contradictions.
This comment seems to be part of a concernabout “wisdom”, assumed to be some extraneous thing an AI would not necessarily have. (No one but Vaniver has brought in wisdom) The counterargument is that compartmentalisation between goals and instrumental knowledge is an extraneous thing an AI would not necessarily have, and that its absence is all that is needed for a contradictions to be noticed and acted on.
2 Whether we should expect generic AI designs to recognize misalignments, or whether such a realization would impact the goal the AI pursues.
It’s an assumption, that needs justification, that any given AI will have goals of a non trivial sort. “Goal” is a term that needs tabooing.
Neither Yudkowsky nor I think either of those are reasonable to expect—as a motivating example, we are happy to subvert the goals that we infer evolution was directing us towards in order to better satisfy “our” goals. I
While we are anthopomirphising, it might be worth pointing out that humans don’t show behaviour patterns of relentlessly pursuing arbitrary goals.
oals. I suspect that Loosemore thinks that viable designs would recognize it, but agrees that in general that recognition does not have to lead to an alignment
Loosemore has put forward a simple suggestion, which MIRI appears not to have considered at all, that on encountering a contradiction, an AI could lapse into a safety mode, if so designed,
3 …sees cleverness and wisdom as closely tied together
You are paraphrasing Loosemoreto sound less technical and more handwaving than his actual comments. The ability to sustain contradictions in a system that is constantly updating itself isnt a given....it requires an architectural choice in favour of compartmentalisation.
All this talk of contradictions is sort of rubbing me the wrong way here. There’s no “contradiction” in an AI having goals that are different to human goals. Logically, this situation is perfectly normal. Loosemore talks about an AI seeing its goals are “massively in contradiction to everything it knows about ”, but… where’s the contradiction? What’s logically wrong with getting strawberries off a plant by burning them?
I don’t see the need for any kind of special compartmentalisation; information about “normal use of strawberries” is already inert facts with no caring attached by default.
If you’re going to program in special criteria that would create caring about this information, okay, but how would such criteria work? How do you stop it from deciding that immortality is contradictory to “everything it knows about death” and refusing to help us solve aging?
In the original scenario, the contradiction us supposed to .be between a hardcoded definition of happiness in the AIs goal system, and inferred knowledge in the execution system.
If that is supposed to be a universal or generic AI, it is a valid criticiYsm to point out that not all AIs are like that.
If that is supposed to be a particular kind of AI, it is a valid criticism to point out that no realistic AIs are like that.
You seem to feel you are not being understood, but what is being said is not clear,
“Superintelligence” is one of the clearer terms here, IMO. It just means more than human intelligence, and humans can notice contradictions.
This comment seems to be part of a concernabout “wisdom”, assumed to be some extraneous thing an AI would not necessarily have. (No one but Vaniver has brought in wisdom) The counterargument is that compartmentalisation between goals and instrumental knowledge is an extraneous thing an AI would not necessarily have, and that its absence is all that is needed for a contradictions to be noticed and acted on.
It’s an assumption, that needs justification, that any given AI will have goals of a non trivial sort. “Goal” is a term that needs tabooing.
While we are anthopomirphising, it might be worth pointing out that humans don’t show behaviour patterns of relentlessly pursuing arbitrary goals.
Loosemore has put forward a simple suggestion, which MIRI appears not to have considered at all, that on encountering a contradiction, an AI could lapse into a safety mode, if so designed,
You are paraphrasing Loosemoreto sound less technical and more handwaving than his actual comments. The ability to sustain contradictions in a system that is constantly updating itself isnt a given....it requires an architectural choice in favour of compartmentalisation.
All this talk of contradictions is sort of rubbing me the wrong way here. There’s no “contradiction” in an AI having goals that are different to human goals. Logically, this situation is perfectly normal. Loosemore talks about an AI seeing its goals are “massively in contradiction to everything it knows about ”, but… where’s the contradiction? What’s logically wrong with getting strawberries off a plant by burning them?
I don’t see the need for any kind of special compartmentalisation; information about “normal use of strawberries” is already inert facts with no caring attached by default.
If you’re going to program in special criteria that would create caring about this information, okay, but how would such criteria work? How do you stop it from deciding that immortality is contradictory to “everything it knows about death” and refusing to help us solve aging?
In the original scenario, the contradiction us supposed to .be between a hardcoded definition of happiness in the AIs goal system, and inferred knowledge in the execution system.