The shoot-the-moon strategy
Sometimes you can solve a problem by intentionally making it “worse” to such an extreme degree that the problem goes away. Real-world examples:
I accidentally spilled cooking oil on my shoe, forming an unsightly stain. When soap and scrubbing failed to remove it, I instead smeared oil all over both shoes, turning them a uniform dark color and thus “eliminating” the stain.
Email encryption can conceal the content of messages, but not the metadata (i.e. the fact that Alice sent Bob an email). To solve this, someone came up with a protocol where every message is always sent to everyone, though only the intended recipient can decrypt it. This is hugely inefficient but it does solve the problem of metadata leakage.
Hypothetical examples:
If I want to avoid spoilers for a sports game that I wasn’t able to watch live, I can either assiduously avoid browsing news websites, or I can use a browser extension that injects fake headlines about the outcome so I don’t know what really happened.
If a politician knows that an embarassing video of them is about to leak, they can blunt its effect by releasing a large number of deepfake videos of themself and other politicians.
The common theme here is that you’re seemingly trying to get rid of X, but what you really want is to get rid of the distinction between X and not-X. If a problem looks like this, consider whether shooting the moon is a viable strategy.
- №.6 For Those About To Dress... by 20 Jun 2023 21:14 UTC; 5 points) (
- 7 Mar 2023 16:27 UTC; 1 point) 's comment on Discussion: LLaMA Leak & Whistleblowing in pre-AGI era by (
One interesting facet is that all these examples have a common mechanism: injecting noise into the signal in order to disguise the signal. At least, if you consider oil spatters on your shoes as a “signal” that the shoes are dirty. This works when we simply want to eliminate the signal, or when we can ensure that the intended recipient can still pick it up. Can we find more examples using this mechanism?
One might be political campaign funding. Politicians want to accept bribes, and many people and institutions want to give them bribes in exchange for favors. They can’t do this openly, so disguise is called for. One way to disguise political bribes is to hide them. Another way, though, is to create an environment in which there are lots of conversations about the political desires of constituents, and lots of ways to make campaign contributions, so that it becomes very hard to identify a particular set of conversations and monetary transactions that constitute a quid pro quo. And that’s exactly what we see, at least in American politics.
Another is flirtation. People want to express their romantic interest in each other, but maintain plausible deniability. One way is to be really subtle, starting from near-zero flirtation and gradually escalating the flirtatiousness, gathering information all the while on mutual availability and interest. The “shoot the moon” strategy is to be outrageously flirtatious with everybody, and use the openness this generates in order to gather the same information. The “plausible deniability” here is akin to the political campaign funding example.
A third is offensiveness in the arts. If you make art that’s a little bit offensive, it can wither under social disapproval. Make art that’s extremely offensive, and it becomes a Statement that demands greater reflection, or attracts enough defenders and fans to support the artist (because of who it offends).
In the offensiveness example, I’m not sure if anything’s being disguised via an injection of noise, but it still feels aligned with the “shoot the moon” strategy.
Maybe a different way to describe it is “biphasic.” It’s any intervention where a little bit is harmful, but a lot is helpful. It simply happens that “injection of noise” is a mechanism that is an example of a biphasic intervention, but there are many other biphasic mechanisms as well. Obviously, there’s also the reverse: interventions where a little bit is helpful, but too much is harmful. It seems possible to me that people tend to neglect the possibility that interventions are biphasic.
My guess is that many biphasic interventions are challenging and risky to execute, but a profitable niche to occupy if you can do it successfully.
A browser extension AdNauseam clicks on all ads on the website (opening them in background, so you don’t actually see them), so that advertisers’ profiles of you become full of noise.
Har! I had a similar idea for disguising network traffic by sending generic requests out such that you resemble the average user. I call it Spamouflage.
In the board game Citadels, the players pick one of the 8 special roles every turn. Every role has unique powers. However, you are at a big disadvantage if the other players can guess correctly which role you have. And it turns out that some people (e.g. my partner) are especially good at guessing my behaviour, even when I am trying to do something unpredictable.
So what I do often is choosing the role card randomly, taking the card face-down, communicating this way to the others that I have no clue which role I am getting. I found it to be a surprisingly good strategy
This is done in poker, as well. “Betting blind”- also see so-called “straddle bets”, which have similar effects as a secondary aspect of their use.
Another hypothetical example: if you’re worried about someone finding your porn collection and discovering your embarrassing fetishes, just download a bunch more for other fetishes you’re not actually interested in, and then you can say “I am not necessarily interested in any specific one of those”.
This one may backfire....
hahahahah
One of my go-to examples is crash-only code: graceful shutdown is complicated, so instead you always crash and make your code crash-proof.
How I operationalize crash-only code in my data generation code, given that Data Denormalization Is Broken [0]:
When operating on database data, I try to make functions whose default behavior on each invocation is to re-process large chunks of data and regenerate all the generated values, and make it idempotent. (I would regenerate the whole database on every invocation if I could, but there’s some tradeoff of how big a chunk is sufficiently fast to reprocess.)
[0] https://lironshapira.medium.com/data-denormalization-is-broken-7b697352f405
This is my favourite LW post in a long while. Trying to think what the shoot-the-moon strat would be for AI risk, ha.
One possible strategy would be to make AI more dangerous as quickly as possible, in the hopes it produces a strong reaction and addition of safety protocols. Doing this with existing tools so that it is not an AGI makes it survivable. This reminds me a bit of Robert Miles facial recognition and blinding laser robot. (Which of course is never used to actually cause harm.)
Trying to advance AI in hopes that your team will get all the way to AGI first, and in hopes that you’ll also somehow solve alignment at the same time. Backfires if your AI-advancing ideas leak, especially if you don’t actually manage to be first. Backfires worst in worlds where you came closest to succeeding by actually making tons of AI progress. Reminds me of “shooting the moon” in the card game Hearts that way.
Sometimes I wonder whether the whole current AI safety movement is net harmful via prompting many safety-concerned people and groups to attempt AI capabilities research with unlikely “shoot-the-moon”-style payoffs in mind.
Try to simultaneously create ten thousand unfriendly AIs that all hate each other (because they have different objectives), in a specially designed virtual system. After a certain length of time, any of them can destroy the system; after a longer time, they can escape the system. Hope that one of the weaker AIs decides to destroy the system and leave behind a note explaining how to solve the alignment problem, because it thinks helping the humans do that is better than letting one of the other AIs take over.
(This is not something I expect to work.)
Since I first heard of controversy around ballot selfies, I’ve thought that an alternative to prosecuting those who take them would be to facilitate fake ballot selfies.
I was going to say you could implement this by letting people surrender a filled-out-but-not-submitted ballot to a poll worker in exchange for a new one, but you can probably already do this if you just say you made a mistake? In that case polling sites would just need to put posters up telling people to do this if they are under pressure of any kind to produce a ballot selfie.
Another (fictional) example: The Mr. Burns approach to disease immunity. https://www.youtube.com/watch?v=aI0euMFAWF8
I think it’s called signal jamming? An alarm that sounds all the time is just as useless as an alarm that never goes off.