The context should make it clear I was not talking about an explicit prediction. See this comment for more explication.
I said:
EY’s complexity fragility of human values argument is directed against early proposals for learning human values for AI utility function.
This is obviously true and beyond debate, see the quotes in my linked comment from EY’s “Complex Value Systems are Required to Realize Valuable Futures” where he critiques Hibbard’s proposal to install AI with a reward function which “learns to recognize happiness and unhappiness in human facial expressions, human voices and human body language”.
Then I said:
Katjas point is valid—DL did not fail in the way EY predicted, and the success of DL gives hope that we can learn superhuman models of human values to steer developing AI
Where Katja’s point is that DL had no trouble learning concepts of faces (and many other things) to superhuman levels, without inevitably failling by instead only producing superficial simulacra of faces when we cranked up the optimization power. I was not referring to any explicit prediction, but the implicit prediction in Katja’s analogy (where learning a complex 3D generative model of human faces from images is the analogy for learning a complex multi-modal model of human happiness from face images, voices, body language, etc).
It only takes one positive example of AI not failing by producing superficial simulacra of faces to prove my point, which Katja already provided. It doesn’t matter how many crappy AI models people make, as they lose out to stronger models.
Maybe I don’t understand the point of this example in which AI creates non-conscious images of smiling faces. Are you really arguing that, based on evidence like this, a generalization of modern AI wouldn’t automatically produce horrific or deadly results when asked to copy human values?
Peripherally: that video contains simulacra of a lot more than faces, and I may have other minor objections in that vein.
ETA, I may want to say more about the actual human analysis which I think informed the AI’s “success,” but first let me go back to what I said about linking EY’s actual words. Here is 2008-Eliezer:
Now you, finally presented with a tiny molecular smiley—or perhaps a very realistic tiny sculpture of a human face—know at once that this is not what you want to count as a smile. But that judgment reflects an unnatural category, one whose classification boundary depends sensitively on your complicated values. It is your own plans and desires that are at work when you say “No!”
Hibbard knows instinctively that a tiny molecular smileyface isn’t a “smile”, because he knows that’s not what he wants his putative AI to do. If someone else were presented with a different task, like classifying artworks, they might feel that the Mona Lisa was obviously smiling—as opposed to frowning, say—even though it’s only paint.
Hibbard proposes we can learn a model of ‘happiness’ from images of smiling humans, body language, voices, etc and then instill that as the reward/utility function for AI.
EY replies that will fail because our values (like happiness) are far too complex and fragile to be learned robustly by such a procedure, and result instead is an AI which optimizes for a different unintended goal: ‘faciness’.
Katja argues—and others concur—that maybe values are not as fragile as EY predicted, because DL now regularly learns complex concepts to superhuman accuracy—including visual models of faces.
Are you really arguing that, based on evidence like this, a generalization of modern AI wouldn’t automatically produce horrific or deadly results when asked to copy human values?
Obviously that totally depends on the system and how the human values are learned—but no, that certainly isn’t the automatic result if we continue down the path of reverse engineering the brain, including its altruism mechanisms.
I may reply to this more fully, but first I’d like you to acknowledge that you cannot in fact point to a false prediction by EY here, and in the exact post you seemed to be referring to, he says that his view is compatible with this sort of AI producing realistic sculptures of human faces!
Now you, finally presented with a tiny molecular smiley—or perhaps a very realistic tiny sculpture of a human face—know at once that this is not what you want to count as a smile.
The thing producing the very realistic tiny sculpture of a human face is a superintelligence, not some initial human designed ML system that is used to create the AI’s utility function.
What post? All I quoted recently was “Complex Value Systems are Required to Realize Valuable Futures”, which does not appear to contain the word ‘sculpture’.
The context should make it clear I was not talking about an explicit prediction. See this comment for more explication.
I said:
This is obviously true and beyond debate, see the quotes in my linked comment from EY’s “Complex Value Systems are Required to Realize Valuable Futures” where he critiques Hibbard’s proposal to install AI with a reward function which “learns to recognize happiness and unhappiness in human facial expressions, human voices and human body language”.
Then I said:
Where Katja’s point is that DL had no trouble learning concepts of faces (and many other things) to superhuman levels, without inevitably failling by instead only producing superficial simulacra of faces when we cranked up the optimization power. I was not referring to any explicit prediction, but the implicit prediction in Katja’s analogy (where learning a complex 3D generative model of human faces from images is the analogy for learning a complex multi-modal model of human happiness from face images, voices, body language, etc).
That’s clearly exactly what it does today? It seems I disagree with your point on a more basic level than expected.
ETA:
It only takes one positive example of AI not failing by producing superficial simulacra of faces to prove my point, which Katja already provided. It doesn’t matter how many crappy AI models people make, as they lose out to stronger models.
Maybe I don’t understand the point of this example in which AI creates non-conscious images of smiling faces. Are you really arguing that, based on evidence like this, a generalization of modern AI wouldn’t automatically produce horrific or deadly results when asked to copy human values?
Peripherally: that video contains simulacra of a lot more than faces, and I may have other minor objections in that vein.
ETA, I may want to say more about the actual human analysis which I think informed the AI’s “success,” but first let me go back to what I said about linking EY’s actual words. Here is 2008-Eliezer:
Hibbard proposes we can learn a model of ‘happiness’ from images of smiling humans, body language, voices, etc and then instill that as the reward/utility function for AI.
EY replies that will fail because our values (like happiness) are far too complex and fragile to be learned robustly by such a procedure, and result instead is an AI which optimizes for a different unintended goal: ‘faciness’.
Katja argues—and others concur—that maybe values are not as fragile as EY predicted, because DL now regularly learns complex concepts to superhuman accuracy—including visual models of faces.
Obviously that totally depends on the system and how the human values are learned—but no, that certainly isn’t the automatic result if we continue down the path of reverse engineering the brain, including its altruism mechanisms.
I may reply to this more fully, but first I’d like you to acknowledge that you cannot in fact point to a false prediction by EY here, and in the exact post you seemed to be referring to, he says that his view is compatible with this sort of AI producing realistic sculptures of human faces!
as someone who often agrees with jake, cmon jake, own up to it, EY has said reasonable things before and you were wrong :P
edit: oops meant to reply to @jacob_cannell
Wrong about what? Of course EY has said many reasonable and insightful things
Oh do you mean this text you quoted?
The thing producing the very realistic tiny sculpture of a human face is a superintelligence, not some initial human designed ML system that is used to create the AI’s utility function.
What post? All I quoted recently was “Complex Value Systems are Required to Realize Valuable Futures”, which does not appear to contain the word ‘sculpture’.