I hoped for some examples like “anti-war movies have unintentionally boosted military recruitment”, which is the only example I remembered myself.
Asked the same question to Claude, it gave me this examples:
Scared Straight programs: These programs, designed to deter juvenile delinquency by exposing at-risk youth to prison life, have been shown to actually increase criminal behavior in participants.
The “Just Say No” anti-drug campaign: While well-intentioned, some research suggests this oversimplified message may have increased drug use among certain groups by triggering a “forbidden fruit” effect.
All others were not much relevant, mostly like “harm of this oversimplified communication was in oversimplification”.
The common thing in two relevant examples and my own example about anti-war movies is, I think, “try to ensure you don’t make bad thing look cool”. Got it.
But is it all? Are there any examples that don’t come down to this?
“try to ensure you don’t make bad thing look cool”
A similar concern is that maybe the thing is so rare that previously most people didn’t even think about it. But now that you reminded them of that, a certain fraction is going to try it for some weird reason.
Infohazard:
Telling large groups of people, especially kids and teenagers, “don’t put a light bulb in your mouth” or “don’t lick the iron fence during winter” predictable leads to some people trying it, because they are curious about what will actually happen, or whether the horrible consequences you described were real.
Similarly, teaching people political correctness can backfire (arguably, from the perspective of the person who makes money by giving political correctness trainings, this is a feature rather than a bug, because it creates a greater demand for their services in future). Like, if you have a workplace with diverse people who are naturally nice to each other, lecturing them about racism/sexism/whatever may upset the existing balance, because suddenly the minorities may get suspicious about possible microaggressions, and the majority will feel uncomfortable in their presence because they will feel like they have to be super careful about every word they say. Which can ironically lead to undesired consequences, when e.g. the white men will stop hanging out with women or black people, because they will feel like they can talk freely (e.g. make jokes) only in their absence.
How does this apply to AI safety? If you say “if you do X, you might destroy humanity”, in theory someone is guaranteed to do X or something similar to X, either because they think it is “edgy”, or because they want to prove you wrong. But in practice, most people don’t actually have an opportunity to do X.
About possible backlashes from unsuccesfull communication.
I hoped for some examples like “anti-war movies have unintentionally boosted military recruitment”, which is the only example I remembered myself.
Asked the same question to Claude, it gave me this examples:
All others were not much relevant, mostly like “harm of this oversimplified communication was in oversimplification”.
The common thing in two relevant examples and my own example about anti-war movies is, I think, “try to ensure you don’t make bad thing look cool”. Got it.
But is it all? Are there any examples that don’t come down to this?
A similar concern is that maybe the thing is so rare that previously most people didn’t even think about it. But now that you reminded them of that, a certain fraction is going to try it for some weird reason.
Infohazard:
Telling large groups of people, especially kids and teenagers, “don’t put a light bulb in your mouth” or “don’t lick the iron fence during winter” predictable leads to some people trying it, because they are curious about what will actually happen, or whether the horrible consequences you described were real.
Similarly, teaching people political correctness can backfire (arguably, from the perspective of the person who makes money by giving political correctness trainings, this is a feature rather than a bug, because it creates a greater demand for their services in future). Like, if you have a workplace with diverse people who are naturally nice to each other, lecturing them about racism/sexism/whatever may upset the existing balance, because suddenly the minorities may get suspicious about possible microaggressions, and the majority will feel uncomfortable in their presence because they will feel like they have to be super careful about every word they say. Which can ironically lead to undesired consequences, when e.g. the white men will stop hanging out with women or black people, because they will feel like they can talk freely (e.g. make jokes) only in their absence.
How does this apply to AI safety? If you say “if you do X, you might destroy humanity”, in theory someone is guaranteed to do X or something similar to X, either because they think it is “edgy”, or because they want to prove you wrong. But in practice, most people don’t actually have an opportunity to do X.