Logan Zoellner comments on Sora What

Logan Zoellner 25 Feb 2024 0:06 UTC
3 points
0
Same thing with image generation. When I want something specific, I expect to be frustrated and disappointed. When I want anything at all within a vibe zone, when variations are welcomed, often the results are great.
I routinely start with a specific image in mind and use AI art to generate it. Mostly this means not just using text-to-image and instead using advanced techniques like controlnet, ipadapter, img2img, regional prompter, etc.
Yes, this is a skill that has to be learned, but it is still 100x faster than I could achieve the same thing before AI art. When it comes to movies, the speedup will be even greater. The $500m budget marvel movies of today will be something that a team of 5-10 people like Corridor Crew can put together in six months on a budget of <$1m two years from now.
There are also important technical limitations of existing (Clip-based) models that go away entirely when we switch to a transformer architecture. This image would be basically impossible to get (using only text-to-image) using existing models.