Pornographic and semi-pornographic ads on mainstream websites as an instance of the AI alignment problem?

In recent days, I have been served ads on Twitter that appeared to be advertising some kind of dating site. While technically I wouldn’t say that they crossed the line into being actually pornographic, they did skirt very close to the line. The advertiser might argue that if the ads were actually advertising lingerie, then it would have been acceptable, but my judgement would be that the combination of the text of the ads, the clothing in the images, the model or AI-generated avatar’s poses, and the service being advertised, collectively made them unacceptable. But that’s just my opinion.

Then yesterday I witnessed an escalation—a similar Twitter ad which, shockingly, actually appeared to be extremely lascivious and pornographic, including what appeared to be partially-visible female genitalia. As a reminder, Twitter ads are served randomly and without warning, in amidst organic tweets and replies. In theory, anyone could have received these ads − 13-year old users, grandmothers, even people using Twitter live while giving a TED talk. While explicit pornographic ads are routinely shown on pornographic websites, Twitter users do not expect to be served them on Twitter.[1]

Update (26 Dec): I have just been served a new version of this explicit ad, with the same picture or a very similar picture. I have uploaded a censored version of the ad here, and corrected a probable minor error in this post below.

Now, as an ordinary Twitter user, I obviously can’t prove that this particular phenomenon is due to AI—or that, even if AI was in some way involved in the production of this ad, that the AI was causally responsible for it crossing the line into actual porn. However, I would make two points:

  1. Even if this was in fact a case of a human accidentally(!) or deliberately producing and submitting an ad that crossed the line into actual pornography, AI technology has developed to such a level that I believe that within a few months or years, we could see imperfectly-aligned AIs—especially if owned and operated by cybercriminals without many moral scruples—making similar errors in the real world. That is, even if this wasn’t caused by an AI, it might as well have been.

  2. In the case of this particular problem—pornographic and semi-pornographic ads being submitted through “self-service” ad platforms like Twitter’s—I don’t believe that the relevant actors need to waste time trying to determine whether the ads are, in fact, produced or submitted by AIs. I will argue below that, regardless of whether this is a real-world, near-term instance of the AI alignment problem, or merely resembles it, pragmatic near-term responses by private-sector and public-sector actors to combat it would need to be pretty much the same.

Considerations for personal protection

In the light of this new risk—which could lead to me being embarrassed, socially ostracised, losing my job or even being arrested for obscene conduct if I happen to be served another such ad while around other people, especially if those other people include minors—I am planning quite strict and radical changes to my online habits, until such time as I become convinced that this problem has been decisively squelched:

When in public or in the presence of others, I will no longer open the Twitter app or website, the Facebook app, or indeed any other app[2] which appears to be allowing self-serve advertising, such as an app I used earlier this year which tells me when the next bus is due at a bus stop. The latter ad also comes with the risk of blasting ads with sound at me and passers-by—I dread to think of the embarrassment that a misaligned AI could cause by generating actually-pornographic video ads with sound. That would take this problem to a new level of awfulness.

Furthermore, if I need to depict Twitter or any of the aforementioned apps in any presentation or talk, or show someone a tweet in the course of a conversation, I will always make a recording or screenshot while in private, rather than use them live. Yes, that does mean that I will not be able to make an impromptu decision to show someone a tweet during the course of a conversation. That is a trade-off I am willing to make.

Even though this might sound extreme and excessive—especially to a reader of this post who has never witnessed these ads and only has my word to go on that they ever existed—I would nevertheless recommend everyone to follow my example, to avoid the potential consequences that I mentioned, such as embarrassment, ostracisation and worse.

Again, an ad-serving algorithm does not necessarily care about who you are, whether you are male or female, your age or your opinions about porn. It may be dimly aware of some of these factors and it may make it less likely for you to be shown these type of ads—though the ad-serving algorithm or AI may be much less smart than the AI creating these ads and could be more oblivious as to their offensive nature. I do not think it is wise to presume that you could not be served these kinds of ads simply due to demographic attributes you happen to possess. I also do not think it is wise to assume that if you are a woman, or if you have publicly and vocally denounced the pornography industry, that you could necessarily escape the bad consequences I mentioned, merely because of those reasons. After all, some women do view porn, hypocrites exist—and in any case, people who see other people apparently viewing porn in public may be angered to such an extent that they do not think clearly.

How AI and automation could be implicated—now or in the future—in this problem

From what I have seen, all of the necessary technology already exists to make it possible in principle to run an AI bot network which autonomously creates new—or hacks into existing—Twitter accounts, creates sexualised or pornographic ads using AI image generators, posts those ads to Twitter and pays for them, either using money allocated to it, or perhaps by using stolen credit card numbers purchased in bulk on the dark web from cybercriminals and allocated to it. Such an AI bot network could even create new—or hack into existing—websites every time, either for the purposes of scamming users or for the purpose of redirecting them to a true target website, and make slight variations in the text, images and names of the websites to make identifying the Twitter accounts as part of the same network, harder for Twitter to accomplish.

Such an AI bot network could also automatically translate and localise the ads and the websites for different geographical regions, do A/​B testing or other testing to optimise click-through rates and/​or conversion rates, and automatically react to individual Twitter accounts being banned by creating or hacking into other Twitter accounts. Indeed, some form of A/​B testing feeding back into generating new AI images may have already been what lead to these ads becoming explicitly pornographic, due to human feedback (this is speculation on my part).

If this is not already happening and is not already responsible for what I witnessed, I expect these kind of operations—and other, similar abuses of self-service ad networks, like crypto scam ads—to be almost-completely automated within a few years.

In reality, of course, it’s possible that anything from 0% to 99% of the work involved in this particular operation is right now being performed by AI, rather than 100% as I predicted would happen in the future above.

Now that Midjourney v6 is available it is fast becoming impossible to tell whether images of people are AI-generated, so I am not even going to attempt to judge that aspect of it. And while popular AI image generation tools are often censored, in an attempt to prevent them being used to generate porn, open source alternatives exist and this has made it possible to develop alternatives which are specifically intended to be used to create porn (which I won’t link to, but can easily be found).

It is already possible to break captchas using AI, so that is no barrier for AI systems—and even if it was, cybercriminal operators could simply solve captchas, or have others do it for them (perhaps as part of some other scam), and then sit back and let the AI do the rest of the work.

While I have not investigated the sexualised ads I saw—I simply reported them to Twitter and moved on—I have investigated a crypto scam ad I saw on Twitter recently, that was an AI-generated short video. Clicking on it, out of curiosity, took me through to a YouTube scam video containing a real human scammer. (The nature of the scam appeared to be to recommend people to send crypto from a legitimate crypto exchange to a fake crypto exchange to profit from arbitrage, all the while assuring the user that the fake crypto exchange was legit, respected, long-standing, their withdrawals would totally work, etc.). I noticed that the ad was posted by an account that appeared to have been hacked. The account was previously posting anime content in a language that looked like it might have been Chinese, and then suddenly switched to posting crypto scam videos in English. If Twitter has bans or human review on new accounts posting ads, hacking existing accounts might allow cybercriminals to evade such measures. To try to avoid the true owners of those accounts reporting them, criminal gangs might immediately change the passwords, and they might only hack into accounts that appear to have been unused for a long while.

Please note that even if it is the case that bot networks are hacking Twitter accounts en masse, that does not necessarily imply that there is anything wrong with Twitter’s security measures. We know that for years now, enormous bot networks have been out there infecting people’s personal computers with malware, which can easily log keystrokes, watch what a user does on their computer, and so on. If a user infected with such malware logged into Twitter, and if they did not have Two-Factor Authentication enabled, it would be trivial for the malware network to detect this and pass on—or sell—their credentials. None of this part requires any kind of AI, though AI could potentially make it more effective at stealing credentials.

However, I did notice a couple of things in passing about the sexualised ads. One was that the preview text for the link sometimes contained something completely unrelated, like “rent cheap server hardware”. This could either be a consequence of cybercriminals buying an ad pointing to an innocuous website, or an “unoccupied” hosting website, and then hacking it afterwards, if Twitter caches the link preview at ad creation time. Or it could indicate something more sophisticated is going on—maybe the website detects that it is being fetched by the Twitter link preview code, and serves something innocuous only to Twitter, to avoid Twitter’s systems catching on to the fact that this is ostensibly a dating site ad. The latter explanation is consistent with the fact that the ad text is vague and doesn’t mention words like “dating”. Together, this, and the fact that each ad appears to point to a different website, would make it harder for Twitter to detect these ads in an automated fashion as being part of the same network—or even that they were all ostensibly about online dating!

Let’s recall Sam Altman’s widely-discussed tweet from a couple of months ago: “i expect ai to be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to some very strange outcomes”.

I suspect I have started to see some of these “strange outcomes”.

If competent cybercriminal gangs did not pay heed to these words, or were not already pursuing this direction of inquiry, then frankly, they were derelict in their “duty” to their crime lords—odious as it may be to use the language of morality in relation to these immoral actors. And this generalises—every time a near-term AI risk is publicly discussed that could present an opening for cybercriminals or terrorists, we should expect their ears to prick up. This should not mean that we ought to stop talking publicly about AI risks—nor could we, given the nature of how democracy works. But it does act as a salutary reminder about the trade-offs we may be, perhaps inadvertently, making when we do so.

Let me be clear: I am not saying that an explicitly pornographic ad on a mainstream website is an example of superhuman persuasion, except in some extremely degenerate sense. My claim is rather that this historical development, and the semi-pornographic ads that I saw that preceded it, are a harbinger of one strategy of extreme persuasion we might see cybercriminal networks try to use in the future. And again, this sort of thing is routinely used in adverts on other porn websites, but what’s new here is this seeping in to a mainstream website in a way which that website almost certainly does not consent to, and in which many of its users—even if they opt to view porn in the privacy of their own homes—would also not consent to.

The relevance of misalignment

There are multiple ways that AI misalignment may have contributed to, or may in future contribute to, problems of this type:

  1. If the explicitly-pornographic image was generated by an AI, the prompt or the “reward function” may have been intended to generate images that were technically non-pornographic, but conveyed the sense of being pornographic and sexually-inviting to the typical male viewer nevertheless—or at least to work this way on enough of the male audience to be profitable. Assuming hypothetically for a moment that the previous ads I saw for similar dating websites were generated by the same AI process, that certainly seemed to be the goal with those previous ads as well. Clearly the latest ad I saw failed to align with the goal in this case, because it went completely over the line into being pornographic. This is not just a matter of my opinion—the standard rule used almost everywhere in the West is that if a woman is wearing a bikini or lingerie or other clothing that covers her genitalia and nipples, that is acceptable, but if any part of her nipples or genitalia (or pubic hair) are visible, that is indecent and pornographic. Not only was the top of her genitalia visible, but she was not wearing any underwear skirt—while her other clothes were some kind of weird webbed attire that I’ve never seen in real life. All of this is very consistent with my hypothesis that it was an image generation AI operating as part of an automated process, and I find it hard to believe that any human artist or photographer or image editor would make this particular kind of mistake, unless they were under the influence of mind-altering substances. Why would any human bother to dress the model in clothing designed to make the ad appear legitimate, if they were only going to show her genitalia at the bottom of the ad anyway? It doesn’t make any logical sense—but it’s exactly the kind of incoherent mistake a misaligned image-generation AI might plausibly make.

  2. Finally, we can consider the elementary point that there are multiple meanings of the term “AI alignment” in use. Even if a slimeball used AI to deliberately create a Twitter ad that crossed the line into pornography and didn’t care, and so their AI was not misaligned with their goals, this AI would still have been unaligned with the basic ethical principle that “ads created for mainstream websites should not be pornographic”, which is probably baked into Twitter’s terms of service. It would have also possibly violated some laws internationally about not exposing minors to pornography as well, which means such an AI would have been misaligned in a legal sense, too. This is not a political point—I am not trying to make any claim one way or another about whether open source models “should” be regulated to make them conform to some law or ethical standard, I am simply pointing out that as a descriptive matter, an AI that created such an ad would have been misaligned in these senses.

Countermeasures

Fortunately, this is one problem that, even though it may now or in the future be impacted by AI misalignment, does not require us to solve AI alignment to address it. Even if we could solve AI alignment, if cybercriminals are using AI bot networks, hacked accounts, IP cloaking mechanism like Tor, and/​or stolen credit cards, it might be hard to even identify who they are and where they are, let alone to force them to use a better-aligned AI system.

The most obvious countermeasure Twitter could institute here is requiring all self-serve ads to be reviewed by a human before being shown to the general public. However, this is like using a sledgehammer to crack a nut, and would be extremely expensive. Going to this level is probably not necessary, at least in terms of coming up with a permanent solution. Some ideas immediately present themselves:

  1. If a well-known brand gets hacked and is used to serve pornographic ads, the public will probably blame the brand for getting their Twitter account hacked, rather than Twitter itself. So arguably well-known brands could be exempted from this requirement.

  2. Requiring the account to have existed for a certain period of time as a measure of legitimacy is not going to work, due to the existence of malware botnets and the potential for hacking into long-lived accounts that results from that.

  3. At a certain point, requiring a sufficiently large bond to be posted, refundable if no-one reports your ads, would deter malicious or misaligned advertisers, because at some point it would become unprofitable. However, this might deter legitimate advertisers. Also, if a misaligned AI advertiser only posted an unarguably misaligned ad 0.1% of the time, an extremely large bond might be required to deter this behaviour. Furthermore, criminals, or in future AIs, that are sufficiently risk-tolerant, might engage in irrationally-risky behaviour and lose their deposits and make no profits anyway. Even though that’s financially profitable for Twitter, it doesn’t ultimately stop the behaviour in that scenario. I feel like the AI alignment community doesn’t discuss the problem of extremely-risk-tolerant AIs enough.

  4. Training an AI to detect pornographic images could work, but even if it let through one pornographic ad 0.01% of the time, that could still result in extremely detrimental consequences for innocent victims who happened to view it around other people, like loss of their job, or even in theory loss of access to their children if they were divorced or separated and they viewed it around their children. And news stories arising from those incidents could be harmful to Twitter’s reputation, too.

  5. Training an AI to detect pornographic images and present them to a human moderator for review if they were in any way unsure, could work—but again, it would probably be necessary to make the AI err on the side of caution a lot, to avoid false negatives.

  6. Finally, ad networks are going to have to grapple, one way or another, with ads that don’t overtly cross the line into pornography, but push against that line very, very hard—and similarly with any other “red lines” that those ad networks might have. Maybe it’s no longer going to be feasible to use the “I know when I see it” test to define obscenity or pornography. Probably an AI, being in some sense naive and in another sense extremely goal-focused, is going to try things which most humans wouldn’t even dare. Perhaps those initial ads, not explicitly pornographic but still at a level where I viewed them as offensive and inappropriate, are an example of that phenomenon.

More broadly, we should expect that AIs will “struggle with”—or “deviously subvert”, depending on your perspective, and propensity to anthropomorphise them—any red lines or ethical standards that are fuzzy and hard to put into words. Indeed, how could they not? They have never lived lives as humans, experienced being embodied in the real world for decades, have never picked up the tacit knowledge that comes from doing so. The AI alignment problem is hard enough with principles that are easy to state, but we would do well to remember that not all principles that people care about have this character.

All of these considerations seem relevant for other self-serve ad networks and other sites using those networks, too—because even if this type of ad isn’t yet being developed for those networks, it’s surely only a matter of time—especially if Twitter gets good at stopping them, as I hope they do.

There is a risk that we enter a failure mode where the private sector kind of ham-fistedly fixes the problem, but still accidentally lets through a tiny percentage of unacceptable ads, and an increasing percentage over time as AIs get more competent, because it really doesn’t want to invest in the scale of human moderation needed to completely eliminate unacceptable self-serve ads. If that comes to pass, in my view, public policy changes might be required to get ad networks and social media networks to fix the problem 100%.

And then maybe once we’ve fixed this urgent problem, Twitter can finally start to address the broader problem of scam ads.

  1. ^

    Note that while Twitter does allow pornographic tweets, I don’t think pornographic tweets are allowed to be promoted as ads—one of the provided report reasons, the reason I selected when reporting the ads, was “Adult sexual services”. Also, pornographic tweets are supposed to be marked as sensitive and their images not shown by default to users, like me, who set a preference not to see sensitive tweets, and that did not happen for me for these tweets.

  2. ^

    While technically I should probably avoid while in public any website that appears to allow self-serve advertising, not just any app which does so, that would be a serious practical limitation which could prevent me from carrying out important daily activities, and so far I have only witnessed this problem on Twitter, so I am making a judgement call not to extend this principle to arbitrary websites.