“try to ensure you don’t make bad thing look cool”
A similar concern is that maybe the thing is so rare that previously most people didn’t even think about it. But now that you reminded them of that, a certain fraction is going to try it for some weird reason.
Infohazard:
Telling large groups of people, especially kids and teenagers, “don’t put a light bulb in your mouth” or “don’t lick the iron fence during winter” predictable leads to some people trying it, because they are curious about what will actually happen, or whether the horrible consequences you described were real.
Similarly, teaching people political correctness can backfire (arguably, from the perspective of the person who makes money by giving political correctness trainings, this is a feature rather than a bug, because it creates a greater demand for their services in future). Like, if you have a workplace with diverse people who are naturally nice to each other, lecturing them about racism/sexism/whatever may upset the existing balance, because suddenly the minorities may get suspicious about possible microaggressions, and the majority will feel uncomfortable in their presence because they will feel like they have to be super careful about every word they say. Which can ironically lead to undesired consequences, when e.g. the white men will stop hanging out with women or black people, because they will feel like they can talk freely (e.g. make jokes) only in their absence.
How does this apply to AI safety? If you say “if you do X, you might destroy humanity”, in theory someone is guaranteed to do X or something similar to X, either because they think it is “edgy”, or because they want to prove you wrong. But in practice, most people don’t actually have an opportunity to do X.
A similar concern is that maybe the thing is so rare that previously most people didn’t even think about it. But now that you reminded them of that, a certain fraction is going to try it for some weird reason.
Infohazard:
Telling large groups of people, especially kids and teenagers, “don’t put a light bulb in your mouth” or “don’t lick the iron fence during winter” predictable leads to some people trying it, because they are curious about what will actually happen, or whether the horrible consequences you described were real.
Similarly, teaching people political correctness can backfire (arguably, from the perspective of the person who makes money by giving political correctness trainings, this is a feature rather than a bug, because it creates a greater demand for their services in future). Like, if you have a workplace with diverse people who are naturally nice to each other, lecturing them about racism/sexism/whatever may upset the existing balance, because suddenly the minorities may get suspicious about possible microaggressions, and the majority will feel uncomfortable in their presence because they will feel like they have to be super careful about every word they say. Which can ironically lead to undesired consequences, when e.g. the white men will stop hanging out with women or black people, because they will feel like they can talk freely (e.g. make jokes) only in their absence.
How does this apply to AI safety? If you say “if you do X, you might destroy humanity”, in theory someone is guaranteed to do X or something similar to X, either because they think it is “edgy”, or because they want to prove you wrong. But in practice, most people don’t actually have an opportunity to do X.