Alex Vermillion comments on Lateral Thinking (AI safety HPMOR fanfic)

Alex Vermillion 3 Jan 2022 21:35 UTC
1 point
I know this is a nitpick, but I don’t understand how “to reduce” is a layer of indirection. Are you saying it’s because it could let you weasel out by doing (for example) a 0.1% reduction? I weakly agree if so, but I still want to actually understand what you meant.

To me, the phrase as-is reads as more of a pedantic thing, like how I can’t “eliminate” risk, only “reduce” it.

I wholly agree that “avert”, by sidestepping the framing of total probability of risk and instead framing things as an effort against one specific thing, manages to fix the whole problem.
- MondSemmel 3 Jan 2022 22:28 UTC
  5 points
  Parent
  On why “reduce” seems like another layer of indirection to me:
  - You could perform motivated reasoning and convince yourself that some plan you’d come up with in five minutes to work on the problem would be good enough since it “could reduce x-risk”. It’s harder to optimize for this nebulous goal, and to achieve a result that’s adequate-as-judged-by-the-universe, if you don’t keep the actual hard problem in mind.
  - As suggested in the Trying to Try post, it’s very easy to accidentally slip an extra layer of indirection into your plans. And once you do so, “This plan has a chance to reduce x-risk”, for example, sounds to me more problematic than “This plan has a chance to avert extinction” or “This plan has a chance to save the world”.
  - People often fall short of their goals. If you fall short of an ambitious goal like “avert extinction”, the result might still be valuable; whereas if you fall short of a nebulous goal like “reduce x-risk”, it’s unclear whether the result would still be valuable. This is what I meant above with: “Someone with “intent to avert extinction” might actually manage to reduce x-risk; whereas someone with “intent to reduce x-risk” could not.”
  - As you said, there’s some very very low percentage point reduction in x-risk that would not feel worthwhile to achieve, so this goal to “reduce x-risk” is at the very least underspecified, relative to what you actually care about.
  - There’s also the problem of how you’d aggregate plans to “reduce x-risk”, versus plans to “avert extinction” or something. I’m reminded of the problem of how to determine counterfactual impact in donation matching: If I donate 100$ if you donate 100$, and this causes you to counterfactually donate those 100$, we might then both be independently tempted to count that as us having caused 200$ of donations, for a total of 2x200$=400$! Similarly, how do you numerically aggregate multiple researchers’ plans to reduce x-risk, in a way where the result still makes sense by the end? If one researcher could actually achieve an 0.1% x-risk reduction, then could 1000 researchers together straightforwardly eliminate that x-risk? That would be a tremendous bargain, but I would not expect reality to work like that.
  - Plus whatever intuition I got from the Trying to Try essay.
  … I may have just argued the same claim in a bunch of ways, but anyway, that’s why “intent to reduce x-risk” sounds problematically indirect to me.
  Finally, the reason why I singled out this specific phrase in the essay is that I think it distorted its meaning. “Intent to kill” is supposed to be an obviously awe-inspiring notion in stories, in a way that e.g. “intent to defeat your opponent” is obviously not. And I think a large part of the difference between these two notions is their levels of (in)directness.
  E.g. here’s a Miyamoto Musashi quote to end on:
  The primary thing when you take a sword in your hands is your intention to cut the enemy, whatever the means. Whenever you parry, hit, spring, strike or touch the enemy’s cutting sword, you must cut the enemy in the same movement. It is essential to attain this. If you think only of hitting, springing, striking or touching the enemy, you will not be able actually to cut him.