I don’t think this is true. If it was possible to distinguish them, you could also guide the diffuser to generate them correctly. And if you created a better classification model, you would probably apply it to generation first rather than solving captchas.
We have to first train the model that generates the image from the captcha, before we can provide any captcha, meaning that the hacker can train their discriminator on images generated by our model.
But even if this was not the case, generating is a more difficult task that evaluating. I’m pretty sure a small clip model that is two years old can detects hands generated by stable diffusion (probably even without any fine tuning), which is a more modern and larger model.
What happens when you train using GANs, is that eventually progress stagnates, even if you keep the discriminator and generator “balanced” (train whichever is doing worse until the other is worse). The models then continually change to trick/not be tricked by the other models. So the limit in making better generators is not that we can’t make discriminators that can’t detect them.
While it is hard for AI to generate very real looking hands, it is a significantly easier task for AI to classify if hands are real or AI generated.
But perhaps it’s possible to make extra distortions somehow that makes it harder for both AI and humans to determine which are real...
I don’t think this is true. If it was possible to distinguish them, you could also guide the diffuser to generate them correctly. And if you created a better classification model, you would probably apply it to generation first rather than solving captchas.
Please correct me if I misunderstand you.
We have to first train the model that generates the image from the captcha, before we can provide any captcha, meaning that the hacker can train their discriminator on images generated by our model.
But even if this was not the case, generating is a more difficult task that evaluating. I’m pretty sure a small clip model that is two years old can detects hands generated by stable diffusion (probably even without any fine tuning), which is a more modern and larger model.
What happens when you train using GANs, is that eventually progress stagnates, even if you keep the discriminator and generator “balanced” (train whichever is doing worse until the other is worse). The models then continually change to trick/not be tricked by the other models. So the limit in making better generators is not that we can’t make discriminators that can’t detect them.