Our best conditional generative models sample from a conditional distribution, they don’t optimize for feature-ness. The GAN analogy is also mostly irrelevant because diffusion models have taken over for conditional generation, and Nate’s comment seems confused as applied to diffusion models.
Nate’s comment isn’t confused, he’s not talking about diffusion models, he’s talking about the kinds of AI that could take over the world and reshape it to optimize for some values/goals/utility-function/etc.
You could very analogously say ‘human faces are fragile’ because if you just leave out the nose it suddenly doesn’t look like a typical human face at all. Sure, but is that the kind of error you get when you try to train ML systems to mimic human faces?
Nate’s comment:
B) wake me when the allegedly maximally-facelike image looks human;
Katja is talking about current ML systems and how the fragility issue EY predicted didn’t materialize (actually it arguably did in earlier systems). Nate’s comment is clearly referencing Katja’s analogy—faciness—and he’s clearly implying we haven’t seen the problem with face generators yet because they haven’t pushed the optimization hard enough to find the maximally-facelike image. But he’s just wrong there—they don’t have that problem, no matter how hard you scale their optimization power—and that is part of why Katja’s analogy works so well at a deeper level: future ML systems do not work the way AI risk folks thought they would.
Diffusion models are relevant because they improve on conditional GANs by leveraging powerful pretrained discriminative foundation models and by allowing for greater optimization power at inference time, improvements that also could be applied to planning agents.
ML systems still use plenty of reinforcement learning, and systems that apply straightforward optimization pressure. We’ve also built a few systems more recently that do something closer to trying to recreate samples from a distribution, but that doesn’t actually help you improve on (or even achieve) human-level performance. In order to improve on human level performance, you either have to hand-code ontologies (by plugging multiple simulator systems together in a CAIS fashion), or just do something like reinforcement learning, which then very quickly does display the error modes everyone is talking about.
Current systems do not display a lack of edge-instantiation behavior. Some of them seem more robust, but the ones that do also seem fundamentally limited (and also, they will likely still show edge-instantiation for their inner objective, but that’s harder to talk about).
And also just to make the very concrete point, Katja linked to a bunch of faces generated by a GAN, which straightforwardly has the problems people are talking about, so there really is no mismatch in the kinds of things that Katja is talking about, and Nate is talking about. We could perform a more optimized search for things that are definitely faces according to the discriminator, and we would likely get something horrifying.
We could perform a more optimized search for things that are definitely faces according to the discriminator, and we would likely get something horrifying.
Sure you could do that, but people usually don’t—unless they intentionally want something horrifying. So if your argument is now “sure, new ML systems totally can solve the faciness problem, but only if you choose to use them correctly”—then great, finally we agree.
Interestingly enough in diffusion planning models if you crank up the discriminator you get trajectories that are higher utility but increasingly unrealistic. You get lower utility trajectories by cranking down the discriminator.
Clarifying questions, either for you or for someone else, to aid my own confusion:
What does “applying optimization pressure” mean? Is steering random noise into the narrow part of configuration space that contains plausible images-of-X (the thing DDPMs and GAN generators do) a straightforward example of it?
Our best conditional generative models sample from a conditional distribution, they don’t optimize for feature-ness. The GAN analogy is also mostly irrelevant because diffusion models have taken over for conditional generation, and Nate’s comment seems confused as applied to diffusion models.
Nate’s comment isn’t confused, he’s not talking about diffusion models, he’s talking about the kinds of AI that could take over the world and reshape it to optimize for some values/goals/utility-function/etc.
Katja says:
Nate’s comment:
Katja is talking about current ML systems and how the fragility issue EY predicted didn’t materialize (actually it arguably did in earlier systems). Nate’s comment is clearly referencing Katja’s analogy—faciness—and he’s clearly implying we haven’t seen the problem with face generators yet because they haven’t pushed the optimization hard enough to find the maximally-facelike image. But he’s just wrong there—they don’t have that problem, no matter how hard you scale their optimization power—and that is part of why Katja’s analogy works so well at a deeper level: future ML systems do not work the way AI risk folks thought they would.
Diffusion models are relevant because they improve on conditional GANs by leveraging powerful pretrained discriminative foundation models and by allowing for greater optimization power at inference time, improvements that also could be applied to planning agents.
ML systems still use plenty of reinforcement learning, and systems that apply straightforward optimization pressure. We’ve also built a few systems more recently that do something closer to trying to recreate samples from a distribution, but that doesn’t actually help you improve on (or even achieve) human-level performance. In order to improve on human level performance, you either have to hand-code ontologies (by plugging multiple simulator systems together in a CAIS fashion), or just do something like reinforcement learning, which then very quickly does display the error modes everyone is talking about.
Current systems do not display a lack of edge-instantiation behavior. Some of them seem more robust, but the ones that do also seem fundamentally limited (and also, they will likely still show edge-instantiation for their inner objective, but that’s harder to talk about).
And also just to make the very concrete point, Katja linked to a bunch of faces generated by a GAN, which straightforwardly has the problems people are talking about, so there really is no mismatch in the kinds of things that Katja is talking about, and Nate is talking about. We could perform a more optimized search for things that are definitely faces according to the discriminator, and we would likely get something horrifying.
Sure you could do that, but people usually don’t—unless they intentionally want something horrifying. So if your argument is now “sure, new ML systems totally can solve the faciness problem, but only if you choose to use them correctly”—then great, finally we agree.
Interestingly enough in diffusion planning models if you crank up the discriminator you get trajectories that are higher utility but increasingly unrealistic. You get lower utility trajectories by cranking down the discriminator.
Clarifying questions, either for you or for someone else, to aid my own confusion:
What does “applying optimization pressure” mean? Is steering random noise into the narrow part of configuration space that contains plausible images-of-X (the thing DDPMs and GAN generators do) a straightforward example of it?
EDIT: Split up above question into two.