paulfchristiano comments on The strategy-stealing assumption

paulfchristiano 19 Oct 2019 17:53 UTC
LW: 8 AF: 4
AF
I wrote this post imagining “strategy-stealing assumption” as something you would assume for the purpose of an argument, for example I might want to justify an AI alignment scheme by arguing “Under a strategy-stealing assumption, this AI would result in an OK outcome.” The post was motivated by trying to write up another argument where I wanted to use this assumption, spending a bit of time trying to think through what the assumption was, and deciding it was likely to be of independent interest. (Although that hasn’t yet appeared in print.)
I’d be happy to have a better name for the research goal of making it so that this kind of assumption is true. I agree this isn’t great. (And then I would probably be able to use that name in the description of this assumption as well.)
What links here?
- Disentangling Perspectives On Strategy-Stealing in AI Safety by shawnghu (18 Dec 2021 20:13 UTC; 20 points)
- Wei Dai 20 Oct 2019 5:24 UTC
  LW: 2 AF: 2
  AF Parent
  
  I wrote this post imagining “strategy-stealing assumption” as something you would assume for the purpose of an argument, for example I might want to justify an AI alignment scheme by arguing “Under a strategy-stealing assumption, this AI would result in an OK outcome.”
  
  When you say “strategy-stealing assumption” in this sentence, do you mean the relatively narrow assumption that you gave in this post, specifically about “flexible influence”:
  
  This argument rests on what I’ll call the strategy-stealing assumption: for any strategy an unaligned AI could use to influence the long-run future, there is an analogous strategy that a similarly-sized group of humans can use in order to capture a similar amount of flexible influence over the future.
  
  or a stronger assumption that also includes that the universe and our values are such that “capture a similar amount of flexible influence over the future” would lead to an OK outcome? I’m guessing the latter? I feel like people, including me sometimes and you in this instance, are equivocating back and forth between these two meanings when using “strategy-stealing assumption”. Maybe we should have two different terms for these two concepts too?