Moebius314 comments on On A List of Lethalities

Moebius314 15 Jun 2022 8:25 UTC
1 point

Surely all pivotal acts that safeguard humanity long into the far future are entirely rational in explanation.

I agree that in hindsight such acts would appear entirely rational and justified, but to not represent a PR problem, they must appear justified (or at least acceptable) to a member of the general public/a law enforcement official/a politician.

Can you offer a reason for why a pivotal act would be a PR problem, or why someone would not want to tell people their best idea for such an act and would use the phrase “outside the Overton window” instead?

To give one example: the oft-cited pivotal act of “using nanotechnology to burn all GPUs” is not something you could put as the official goal on your company website. If the public seriously thought that a group of people pursued this goal and had any chance of even coming close to achieving it, they would strongly oppose such a plan. In order to even see why it might be a justified action to take, one needs to understand (and accept) many highly non-intuitive assumptions about intelligence explosions, orthogonality, etc.

More generally, I think many possible pivotal acts will to some degree be adversarial since they are literally about stopping people from doing or getting something they want (building an AGI, reaping the economic benefits from using an AGI, etc). There might be strategies for such an act which are inside the overton window (creating a superhuman propaganda-bot that convinces everyone to stop), but all strategies involving anything resembling force (like burning the GPUs) will run counter to established laws and social norms.

So I can absolutely imagine that someone has an idea about a pivotal act which, if posted publically, could be used in a PR campaign by opponents of AI alignment (“look what crazy and unethical ideas these people are discussing in their forums”). That’s why I was asking what the best forms of discourse could be that avoid this danger.