Digital Molecular Assemblers: What synthetic media/generative AI actually represents, and where I think it’s going
I had an epiphany about synthetic media back in December 2017. In that epiphany, I came to the conclusion that we were, at most, ten to fifteen years away from the day that artificial intelligence could reliably synthesize any sort of media, and that we were likely only five years away from the first big public demonstrations of this technology.
Five years after that epiphany, we lived in a world of DALL-E 2, Midjourney, Stable Diffusion, ChatGPT, HarmonAI, and much more. As I anticipated, these were still fleeting examples of what I imagined, with the machines incapable of maintaining long-form coherency. However, with talent, one could utilize these tools effectively.
However, my epiphany was fairly limited and conservative for the near future. I predicted that GANs— generative adversarial networks— would drive this revolution. I presumed that movies and 3D procedural generation would not truly begin taking off until 2027. And I even said that none of these tools would be available on your desktop within five years— and perhaps I was right in some regard since, besides Stable Diffusion, none of these are locally run. Perhaps I simply underestimated the Cloud to that end. It might seem in retrospect that I was being vastly too conservative. In truth, I actually thought myself far too liberal for a time, especially in 2018 and 2019 when so little visible public progress in synthetic media occurred.
When I created a follow-up to that epiphany in 2022, I felt the need to refine some of my thoughts and be a bit more deliberately liberal. Now, even those predictions I outlined not even a full year ago are starting to come off as increasingly conservative as more capabilities are derived from diffusion models. Capabilities I presumed to be reasonable for 2027 now seem likely by next year.
But my central point back then wasn’t to focus on the technical details, but rather the emerging capabilities, the “democratization of creation.”
Even back in 2017, my understanding of all this occurred because I suffer from hyperphantasia, a fairly rare mental condition where mental imagery is excessively realistic and malleable. Descartes once said that the most vivid of mental creations is still duller than the dullest of real ones. My own brain does not want to agree. While it may be dull in the moment, it amplifies in my very-short-term sensory memory, to the point I can actually lose myself in my own thoughts, especially when assisted with audio. Visualization is extraordinarily easy for me. I can even re-experience sounds, tastes, and smells on command. Is this normal memory recall, or is it actually hyperphantasia? I cannot be completely sure because my experiences are my own. However, most people I speak to seem flabbergasted that my recall and visualization abilities are so intense and acute, suggesting my doubts are unfounded.
And a thought occurred to me one night early that December: all of what I could pull off in my head, these mental movies, these concerts, these edits to what I was hearing, these video games I could play in my head, these whole franchises I had envisioned and experienced all to myself— was it possible to rip all that out of my brain and put it onto a computer screen? And not through my own efforts, but through making a computer essentially “hallucinate” all of that?
Even with a modest amount of research, the answer was an overwhelming “Yes” even at that point in time.
Except I wanted to go even further with this. It’s one thing to envision a Holodeck, where computers could recreate cartoons or make generic pop songs. I thought about this on a deeper level.
What exactly does media synthesis entail? If I wanted to get an AI to create a cartoon, what would I need it to do? Likely automate an agent to use a cartoon-creating program or create exaggerations from scratch. Similarly with a live-action media creator. Deepfakes, GANs, the works.
“No, that’s not quite right. That’s all AI using tools. I mean, can AI create literally anything?”
And I understood that there was absolutely nothing preventing this from being possible. Using pre-existing or newfound tools was one thing. What I was anticipating was something a bit more advanced: molecular assembly. Or at least the digital equivalent of it.
Online, I see a misconception around generative AI’s future a lot. Many people now accept that AI will soon be able to create movies and TV shows, for example. However, they suffer a fatal lack of imagination by suggesting that automated animation ought to be feasible in five to ten years, and that live-action material ought to be doable in perhaps twenty. If AI accelerates exceptionally quickly, we might have automated animation in a year or two, live-action within five. These early automated cartoons will be simple, closer to 1960s fare or generic anime. Similarly, there’s the perception that AI will be able to create generic hyperpop and dance music very soon, if it doesn’t already, and that it may be able to create rock and jazz many years in the future, if ever.
And to that, I feel the need to say, “If I can use my molecular assembler to create an apple, I can use it to create a brick. If I can turn this sand to water, I can also turn it into wine. If my molecular assembler can create a basic cup, it can also create a rococo overdesigned ornate one. It’s all the same to it. Just different levels of detail at certain points.”
Generative AI is essentially the digital equivalent of a molecular assembler. Pixels, vectors, audio samples, all these bits of data that can be represented through media can be synthesized.
You might think this is an obvious truth, but it’s a surprisingly novel epiphany for many people who fail to realize this.
There is perhaps even a danger inherent in this misunderstanding. A person who doesn’t understand that every image, every gif, every gif, every webpage, every book, every visual they see on their screen is represented by pixels, utterly unconcerned with the soul of the shapes they take on, is more likely to be fooled by something machine-generated.
This veers dangerously close to the Dead Internet Theory. I personally do not find it to be completely true, if only due to the technical limitations towards its realization as of 2023. However, I do think that, in the future, it will be more true than the theory already posits.
Because again, all digital media you consume is composed of basic units of information: bits. Bits that form pixels, vectors, voxels, polygons, and samples. Of course, when it comes to displaying media, only pixels and samples really matter (unless you still have an analog screen). Pixels are the basic unit of digital visual information; samples are the basic unit of digital audio information. Perhaps you could get far more technical, but my point stakes it claim at this:
If you can control these two things, you are capable of generating any piece of media on your screen. A digital Library of Babel indeed.
I can use Midjourney right now to create anything from a childlike stick figure all the way to a Michaelangelo-level masterpiece. They both take roughly the same amount of time to render. If there was a Midjourney for audio, I could conceivably create anything from a 3-chord hardcore punk song all the way to an incredibly baroque orchestral piece in two very similar 30-second spans of rendering time.
When the ability to generate cartoons is more advanced and widespread, there isn’t going to be a curve of time where we are only able to generate basic doodles and stock-program animation and then, after a few years of improvements, we’ll be able to match Pixar. We’re going to be able to match Pixar right out of the gate. To a diffusion model, the only difference between a low-budget Hanna-Barbera cartoon and a 100-million-dollar Hollywood-level CGI flick is the prompt you give it.
A lot of people don’t understand this. I didn’t realize it during my original epiphany, as I suffered from the misconception that AI would develop along the same lines of skill as a human— that it would start with basic shapes and figures and only eventually create more complicated scenery and faces. That the reality was that the AIs would be able to create any level of complexity from the start would have seemed too outrageous to me at the time.
The logical endpoint of this is a concept I called the “imagination engine” or, previously, the “magic media machine.”
The Imagination Engine is the final evolved state of humanity’s creative behavioral modernity, the point at which AI can bring to life any thought or impulse at any quality, to any extent. It’s the culmination of multimodal AI and the realization of the “digital molecular assembler,” a single program where your prompts (text, voice, thoughts, whathaveyou) are transformed into something real, at least in virtual space. If this sounds vague, that’s because the sheer extent of the ability of an imagination engine is that absurdly wide. It’s not just an image synthesis, audio synthesis, or text synthesis model. It’s not just capable of creating movies and video games. It’s not only limited to chatbots and avatars. It’s all of the above and more. With an imagination engine, you very well could choose to create your own personal fantasy world in its entirety.
It is physicalized hyperphantasia, your computer lucid dreaming at your command. Even if artificial general intelligence was not imminent, I could not place the emergence of an imagination engine any further out than the late 2020s.
And again, there is a danger inherent to this. We humans love focusing on danger, so naturally I can’t help but consider it.
In the context of the Dead Internet Theory, this sort of technology could easily lead to the entire internet becoming a giant hallucination. I don’t expect this one bit, if only because it genuinely is not in anyone’s interest to do so: internet corporations rely heavily on ad revenue, and a madly hallucinating internet is the death of this if there’s too much of an overhang. Surveillance groups would like to not be bogged down with millions or billions of ultra-realistic bots obscuring their targets. An aligned AGI would likely rather the humans under its jurisdiction be given the most truthful information possible. The only people who benefit from a madly hallucinating internet are sadists and totalitarians who wish only to confuse. And admittedly, there are plenty of sadists and totalitarians in control; their desire for power, however, currently conflicts strongly with the profit motive driven by the need to actually sell products. You can’t sell products to a chatbot.
Of course, if this ever were to become such a terrific problem that the internet truly dies, I do foresee an alternative internet arising, useful only in that it would be built to be as factual as possible, but otherwise horrendously far from an ideal alternative: one where anonymity is simply not allowed, or is very heavily regulated due to real-world ID tags being used to monitor who is online at all times. To the average person, this panopticon internet too could simply be an AI hallucination, but I can absolutely imagine such a network having its users, though I absolutely cannot see a majority of people caring enough.
Plus, in a world where the imagination engine exists, you could simply choose to have it generate your own personal internet so you’d never have to interact with either of the above, and at some point, the lines between what is real and what is your own fantasy would be too blurred to make out.
Perhaps, then, we might see a resurgence of more analog experiences. Should we not die to AI or societal collapse, presumably we’ll live in an age of abundance; there’s no reason why we wouldn’t.
Some might argue “We could also upload our consciousness to the Cloud,” and indeed, some will. I maintain, however, the majority of humanity will elect to stay biological or mostly biological for as long as possible. The benefits to uploading are discussed only in transhumanist spaces where such an action is already agreed upon as being a good thing (for the most part). The vast majority of humanity almost certainly will want nothing to do with such a thing, no matter the benefits that are claimed to be offered, without an extreme and overwhelming cultural revolution (or simply by force). As mentioned in another post of mine, it’s too easy to think of the arguments about transhumanism and uploading between enlightened post-political cyberdelics and stock-photo Luddites, when the cold fact is that the overwhelming majority of humanity would roundly reject the very concept of the Technological Singularity and transhumanism if they knew of it at all; the popularity of the movement in online groups in the West/Far East is wholly unrepresentative of its popularity in general. Invasive, cybernetic transformation of the human form is not a popular concept outside of niche futurist circles, and likely will not be even when the capability exists, as the technology involved is of a wholly different sort than even the likes of smartphones and social media. You could convince an ancient Sumerian to use Facebook if you described it prettily enough, whereas even many modern futurists are deeply put off by the promise of transhumanism even when they are fully aware of its potential benefits.
Presuming that we have an aligned AGI that would rather respect human autonomy if at all possible, I see no likely chance that a number any greater than 500 million will willingly choose to live in the Cloud any time in the next 100 years. The global population will likely peak around 10 billion.
95% of humanity choosing to remain in meatspace means that this 95% has to deal with the aftereffects of synthetic media in our physical and digital world.
I do not see the death of human artistry. It will become far more “artisan” in nature, a niche thing with niche markets not unlike smithing or horse-riding, except larger and wider in scope and nature due to the universality of creativity and self-improvement.
But I do foresee a future of hyper-accelerated media creation, one which is democratized and thus also wider in demographics.
That said, I do not foresee a day where “every man is an artist” per se.
In the beginning, when synthetic media is overhyped and widely discussed, I can see every adopter getting a feel for the technology and doing something impressive with it: generating a movie or a video game or a TV show from some prompts. Perhaps even dedicating some days to this. But for the most part as the novelty and initial hype wears off and the technology matures further, I expect at most 65% to 70% of people’s interactions with synthetic media to be pure consumption and light creation— memes, shitposts, personal ditties, filters, edits to existing media, avatars, and the like. Certainly more than what the average person can do now, but for the most part, they’re doing little more than an evolution of what happens now online and in the physical world: consuming what has been created, without too much care or discrimination as to its source. Despite how atomized our culture has become, we still do care for the social aspect of consumption. Most people have no interest in being creators themselves; they’re perfectly fine with being consumers, sharing in a pop cultural experience with others.
Well, I say this, but there is one purpose that will likely see synthetic media adoption rates approaching 100%.
Anyway, a fairly sizable number, perhaps 10%, perhaps more, will persist as a pro-human contingent to some level, in that they go out of their way to create media that is predominantly or completely crafted by human hands. That is, anything more involved than prompting and light editing of the output. I would say this number ought to be lower, but anthropocentric bias exists, not to mention that a lot of current artists with influence have had horrendously negative interactions with overly entitled “AI artists” who managed to squander goodwill, and then there’s also the aspect that comparative advantages will exist that lead people to valuing the hand-crafted and hand-made, all of which I have a feeling will one day sprout into a genuine artistic-Luddite movement and counterculture that ensures human-created media exists in larger amounts than one might otherwise expect when compared to pure technological ability.
The rest are the ones who create synthetic media content to some extent. As time goes on, quality becomes less of a distinguishing factor, especially as synthetic media begins to flood the internet at levels unfathomable to people today. When the difference in quality becomes a simple act of prompt engineering, the content offered will become more important. And even then, because of the sheer amount of what’s created, curation is likely done through popular voting and niche-searching.
One could certainly have their own niches generated with ease, but it’s easy to forget when talking of such advanced technologies that humanity is a rather social ape. We don’t consume just for our own pleasure and benefit. If we did, theaters, live music venues, art museums, and other public gatherings would not continue to exist so deeply into the age of television, the internet, and streaming.
I say that 99% of media created will be individualized and personalized. But despite how high this number is, it’s also an exaggeration. 99% of media created in the future may be AI-generated or AI-edited, but consider that 95% of that is simply throwaway nonsense, little things that only exist in a single moment before being discarded, not unlike the copious number of failed outputs in Stable Diffusion, Midjourney, and ChatGPT that are already lost to the digital ether due to be low-quality, worthless, completely wrong, or simply off in a slight way. The media you consume that you find meaningful, interesting enough to find entertaining and contemplative, and worth remembering will likely only be a tiny fraction of that. Perhaps with the right brain-computer interfaces, media creation will be streamlined even further. In such a scenario, that number of individualized media only goes down, not up, unless you deliberately choose to cocoon yourself off from the rest of the world. A true hikikomori who wishes to use the Singularity to transcend into your own personal utopia, no one else allowed.
This is, by a staggering amount, not what most people will do unless forced to. It’s one reason why I disavowed Singularitarianism, in fact— the perception that the vast majority of my peers in this group are socially awkward anime-obsessed narcissists living profoundly mediocre lives, which would explain their desire to have everything upended and thrown away at the soonest possible opportunity. Their dreams will come true, enabled by artificial intelligence, but I feel there’s a massive disconnect between the expectation of what’s to come and the likely reality. The expectation is that everyone will create their own personalized movies, TV shows, music, and video games, living their dreams and destroying the existing monoculture. It will be wonderful, pleasant, utopic, and only just a little bit horny.
So what will people do with it?
The likely reality will be a singularity of debauchery, memes, and a supercharged pop culture.
Let me get this out of the way first and foremost: we all know what generative AI is going to be used for the most, by a sickeningly profound distance. It’s not a secret or something unexpected. OpenAI, Midjourney, Google, and their lot would love it if generative AI was only ever used to create cute animals wearing funny hats doing silly things, or of epic panoramas and stock photographs.
Generative AI is going to be used for an ungodly amount of porn. The sheer diversity, quality, amount, and range of which is so chilling to even contemplate that it boggles my mind.
Beyond that, it’s going to be used for memes. Memes, funny things, shitposts, and the inane. Some of this will cross over with serious endeavors (mentioned below), but are mostly done for a laugh at most: the Beatles if every song was about a sad clown, Seinfeld if Kramer was a catgirl, Breaking Bad as an 80s anime series except horribly cursed, Metallica with Lars on every role, every band with Lars on every role, Metal Gear Solid but Snake really genuinely is too dummy thicc, triple-A quality meme games, TV shows about Ea-Nasir, Tumblr and 4chan metaverse hub wars, fictional metaverse planets and universes where you’re regularly inconvenienced by trollish laws of physics, turbo-surrealism with the equivalent of billion-dollar-budgets, Rage Against the Machine as an opera group, the Wacky Animated Adventures of Hitler, Stalin, Trotsky, Tito, and Freud in Vienna, and only God fearingly knows the monstrous range of what else will soon be (come up with your own cursed ideas in the comments!)
Beyond that, it’s going to be used for lesser inanity, the kinds of things you expect an aunt on Facebook would use it for: animating pictures, creating avatars, little movies and comics, voice overs, and the like. I already can see a relative sending me an altered version of a Roy Orbison song personalized entirely for me.
Beyond that, you get the more mundane and “useful” applications of synthetic media: basic aesthetics, perfect ports and upscales, corrected programming and code, education, utilitarian design, corporate projects (providing corporations as we understand them still exist), the works.
Beyond that, we arrive at the more serious creative endeavors: the works, worlds, projects, and franchises that could not have existed before due to a lack of capital and influence. Cartoon-a-trons, all the seasons of Firefly and Futurama that were never made, all the Beatles albums that could have been, the Pink Floyd/Radiohead collaborations we never got, video games and mods, comic series, fanfiction turned into professional-quality works, revivals of long-dead franchises, and bedroom multimedia franchises in general. In due time, this will inevitably include personal worlds and universes in virtual reality. My own projects, such as the Yabanverse and Babylon Today, will be brought to life and shared this way (likely utterly ignored, drowned out in the sea of projects). While much of this will likely remain as part of purely individualized fantasy worlds never meant to be shared, it is foolish to claim that nothing will be shared and that everyone will retreat into their own fantasy worlds. That may be what you would do with an imagination engine, but even I— a notoriously introverted nerd— will likely focus most of my future efforts on entertaining others rather than just myself.
And hey, if no one bites and I’m left with whole franchises without an audience, I can always get the imagination engine to hallucinate that I’ve created billion-dollar culture phenomena on my own personal fake slice of the internet.
It will be a gloriously debaucherous age.
Of course, AI or societal collapse will likely kill us all. But if, by some linear sequence of miracles, we survive, some of us might wish we didn’t.
Addendum: God, I didn’t even begin to touch upon the topic of the coming Golden Age of Fanfiction.
Let me stress here and now that generative media does not mean the death of copyright. Anyone seriously arguing that the ability to create media of any quality will mean that IPs will be fully democratized has no understanding of how IP laws work, no understanding of how fandom works, and is likely more interested in the idealism of generative media than the actual reality.
Until copyright laws are abolished entirely, rights holders will have the final say on what’s canon and what isn’t, to what’s commercializable for profit and what is purely nonprofit entertainment. This is surprisingly a major misunderstanding among many of the techno-anarchists gushing about generative AI as well, as they view the future only as the total teardown of all existing media structures with the hammer of artificial intelligence. In truth, generating an ultra-high-quality work set in an existing IP without a license or approval from the rights holders/creators is little more than a neat fan product. It doesn’t matter how high quality that product is or in what volume. There is probably no better example of this that I know than the wild and wacky world of the Sonic the Hedgehog franchise, where there exist fangames of freakishly high quality in no small number, often vastly better than any officially licensed product, and often directly supported by Sega themselves. None of which are canonical to the main series and, thus, do not matter in discussions of said main series.
Generative media used in existing IPs is essentially this, but supercharged.
There may be some rights holders who are sympathetic to fan creators and turn their works into open or shared universes. This is likely not going to be the case for most works, however.
Many people, I’d even say the majority of people, hold fast to a general rule of thumb that the creator of a work is the respective God. As such, Word of God reigns supreme, no matter the quality of fanworks, no matter how satisfying a fanwork may be. All generative AI will change is the sheer quantity of said fanworks and creating new, lesser gods.
And to that end, I do wish to mention that we are on the cusp of a “Golden Age of Fanfiction.”′
If you intend on synthesizing all those seasons of Firefly we were denied, the Star Wars prequels and sequels that could have been, the TimeSplitters 4 that never was, or any other “continuation” or “realization” of something in an established franchise, congratulations, welcome to the wild and wonderful world of fanfiction in the age of generative AI. You’ll be sharing this space with a lot of My Little Pony movies and video games that look as if they have hundred-million-dollar budgets, as well as the inevitable Hollywood-level My Immortal movie adaptation and accompanying simulation/metaverse hub.
Fanfiction historically has been low-quality work precisely due to the fact it is inherently uncommercializable. They’re often the first serious literary works by teenagers and young adults, and they can only be passion projects done in one’s spare time with no intention of serious critical examination or profit. This inherently lowers the amount of capital intended to be put into such works.
With the rise of synthetic media, capital is no longer an issue in terms of the raw quality of any work. A few words in a prompt is all that separates something that looks like trash from something which resembles a masterpiece.
Consider this.
In a book, what separates a heartfelt father-to-daughter talk under an oak tree from a trillion-man battle in a wartorn galaxy? A few good lines of prose.
What separates the same thing in a feature film? Likely tens of millions of dollars in terms of set design, special effects, actors, and more.
In an imagination engine, all that separates them is a prompt and a setting.
Is it any wonder, then, why fanfiction is dominated by prose and, to a lesser extent, comics? When that barrier of capital falls, fanfiction will flourish to frightening levels.
GOD HELP US ALL.
You’re the guy behind the Yabanverse??? I’ve seen that project occasionally on /r/worldbuilding and loved it! What a coincidence!
Yuli Ban is talking about the prediction and emergence of generative AIs, the extent to which those can disrupt humanities reliance on creativity and productivity.
He mentions ‘the dead internet theory’ that postulates that most content is autogenerated, obfuscating the actual people using the internet and reducing their actual exposure.
I think we already see this in social media, internet forums and other areas where fake content and profiles are detected. and this can spread to youtube and short form video platforms, telemarketing and scams. as well as use by political groups and states.
Yuli also mentions the long term implications—peaking human population and the notion of Transhumanism
where humans merge with an infinitely more capable AI which assumes control. he mentions how biology is a quality many would like to preserve, to varying degrees.
I would add: I predict that biology will be a status symbol in the coming decades.
Pure bio elites would cooperate with heavily modified transhumans in order to secure the order of affairs in the world.
with 90% of us somewhere in the middle between full bio and state of the art enhancement.
digital humans and robocontent might also plat a significant role.
the metaverse appears like an attempt to pre-empt this scenario by corporates.
Lastly : it’s probably up to us to decide what aspects of human talent do we respect (as we do in sport)
and what aspects do we do away with.
will humans live life according to medieval competetive norms, with emulated kings, champions, courtship , or will we live in a ‘happy box’ resembling ideal paleolithic conditions, without math, arts, language : a garden of eden of sorts, or hgwells eloi?
*I’m a human writing this on my own.
If you have made a (digital) picture, and AI has made a picture, they are different pictures, even if they have exactly same bytes in the same places.
Because art is more than just bytes. It also carries the history of it’s creation. Ancestry. Connection with it’s creator.
AIs that have trained by humans from human texts and art still carry the human touch. Connects us to other humans in new way.
But eventually, AIs will not need the human help. Like AlphaGo Zero that have learned to play Go on superhuman level just from knowing rules and playing with itself. You will not interact with people playing AlphaGo Zero, or consuming the brilliant art and video that SuperAI will made specifically for you. You will be utterly alone—no needing anyone and not needed by anyone.
But if you value human interaction, you will seek communication, services, art, code, etc made by humans and provide them to humans. Therefore giving yourself and others the most difficult thing to attain for a biohuman in postsingularity—the MEANING of your existence. That’s what I will probably choose, given the chance, and probably many others will as well.