I mean differentiation in the sense of differentiating between the abstract categories. Is a half a face that appears to be smiling while the other half is burn off still a “smiley face”? Even I’m not sure.
I’m certainly not arguing that training an AGI to maximise smiling faces is a good idea. It’s simply a case of giving the AGI the wrong goal.
My point is that a super intelligence will form very good abstractions, and based on these it will learn to classify very well. The problem with the famous tank example you cite is that they were training the system from scratch on a limited number of examples that all contained a clear bias. That’s a problem for inductive inference systems in general. A super intelligent machine will be able to process vast amounts of information, ideally from a wide range of sources and thus avoid these types of problems for common categories, such as happiness and smiley faces.
If what I’m saying is correct, this is great news as it means that a sufficiently intelligent machine that has been exposed to a wide range of input will form good models of happiness, wisdom, kindness etc. Things that, as you like to point out, even we can’t define all that well. Hooking up the machine to then take these as its goals, I suspect won’t then be all that hard as we can open up its “brain” and work this out.
If what I’m saying is correct, this is great news as it means that a sufficiently intelligent machine that has been exposed to a wide range of input will form good models of happiness, wisdom, kindness etc.
Hopefully. Assuming server-side intelligence, the machine may initially know a lot about text, a reasonable amount about images, and a bit about audio and video.
Its view of things is likely to be pretty strange—compared to a human. It will live in cyberspace, and for a while may see the rest of the world through a glass, darkly.
Surely the discussion is not about the issue of whether an AI will be able to be sophisticated in forming abstractions—if it is of interest, then presumably it will be.
But the concern discussed here is how to determine beforehand that those abstractions will be formed in a context characterised here as Friendly AI. The concern is to pre-ordain that context before the AI achieves superintelligence.
Thus the limitations of communicating desirable concepts apply.
I mean differentiation in the sense of differentiating between the abstract categories. Is a half a face that appears to be smiling while the other half is burn off still a “smiley face”? Even I’m not sure.
I’m certainly not arguing that training an AGI to maximise smiling faces is a good idea. It’s simply a case of giving the AGI the wrong goal.
My point is that a super intelligence will form very good abstractions, and based on these it will learn to classify very well. The problem with the famous tank example you cite is that they were training the system from scratch on a limited number of examples that all contained a clear bias. That’s a problem for inductive inference systems in general. A super intelligent machine will be able to process vast amounts of information, ideally from a wide range of sources and thus avoid these types of problems for common categories, such as happiness and smiley faces.
If what I’m saying is correct, this is great news as it means that a sufficiently intelligent machine that has been exposed to a wide range of input will form good models of happiness, wisdom, kindness etc. Things that, as you like to point out, even we can’t define all that well. Hooking up the machine to then take these as its goals, I suspect won’t then be all that hard as we can open up its “brain” and work this out.
There are some excellent predictions in this thread. We have here some “natural abstraction hypothesis” and some “mechanistic interpretability”.
Hopefully. Assuming server-side intelligence, the machine may initially know a lot about text, a reasonable amount about images, and a bit about audio and video.
Its view of things is likely to be pretty strange—compared to a human. It will live in cyberspace, and for a while may see the rest of the world through a glass, darkly.
Surely the discussion is not about the issue of whether an AI will be able to be sophisticated in forming abstractions—if it is of interest, then presumably it will be.
But the concern discussed here is how to determine beforehand that those abstractions will be formed in a context characterised here as Friendly AI. The concern is to pre-ordain that context before the AI achieves superintelligence.
Thus the limitations of communicating desirable concepts apply.