Noosphere89 comments on AI Safety Endgame Stories

Noosphere89 28 Sep 2022 22:09 UTC
1 point
0

Counterfactual Impact and Power-Seeking

It worries me that many of the most promising theories of impact for alignment end up with the structure “acquire power, then use it for good”.

This seems to be a result of the counterfactual impact framing and a bias towards simple plans. You are a tiny agent in an unfathomably large world, trying to intervene on what may be the biggest event in human history. If you try to generate stories where you have a clear, simple counterfactual impact, most of them will involve power-seeking for the usual instrumental convergence reasons. Power-seeking might be necessary sometimes, but it seems extremely dangerous as a general attitude; ironically human power-seeking is one of the key drivers of AI x-risk to begin with. Benjamin Ross Hoffman writes beautifully about this problem in Against responsibility.

I don’t have any good solutions, other than a general bias away from power-seeking strategies and towards strategies involving cooperation, dealism, and reducing transaction costs. I think the pivotal act framing is particularly dangerous, and aiming to delay existential catastrophe rather than preventing it completely is a better policy for most actors.

This is why AI risk is so high, in a nutshell.

Yet unlike this post (or Benjamin Ross Hoffman’s post), I think this was a sad, but crucially necessary decision. I think the option you propose is at least partially a fabricated option. I think a lot of the reason is people dearly want to there be a better option, even if it’s not there.

Link to fabricated options:

https://www.lesswrong.com/posts/gNodQGNoPDjztasbh/lies-damn-lies-and-fabricated-options
- Ivan Vendrov 29 Sep 2022 3:12 UTC
  1 point
  0
  Parent
  Fabricated options are products of incoherent thinking; what is the incoherence you’re pointing out with policies that aim to delay existential catastrophe or reduce transaction costs between existing power centers?
  - Noosphere89 29 Sep 2022 12:47 UTC
    1 point
    0
    Parent
    I think the fabricated option here is just supporting the companies making AI, when my view is that by default, capitalist incentives kill us all due to boosting AI capabilities while doing approximately zero AI safety, in particular deceptive alignment would not be invested in despite this being the majority of the risk.
    
    One of the most important points for AGI safety is the leader in AGI needs a lot of breathing space and leadership ahead of their competitors, and I think this needs to be done semi-unilaterally by an organization not having capitalist incentives, because all the incentives point towards ever faster, not slowing down AGI capabilities. That’s why I think your options are fabricated, because they assume unrealistically good incentives to do what you want.
    - Ivan Vendrov 29 Sep 2022 17:06 UTC
      1 point
      0
      Parent
      I don’t mean to suggest “just supporting the companies” is a good strategy, but there are promising non-power-seeking strategies like “improve collaboration between the leading AI labs” that I think are worth biasing towards.
      Maybe the crux is how strongly capitalist incentives bind AI lab behavior. I think none of the currently leading AI labs (OpenAI, DeepMind, Google Brain) are actually so tightly bound by capitalist incentives that their leaders couldn’t delay AI system deployment by at least a few months, and probably more like several years, before capitalist incentives in the form of shareholder lawsuits or new entrants that poach their key technical staff have a chance to materialize.
      - Noosphere89 29 Sep 2022 17:38 UTC
        1 point
        0
        Parent
        
        Maybe the crux is how strongly capitalist incentives bind AI lab behavior. I think none of the currently leading AI labs (OpenAI, DeepMind, Google Brain) are actually so tightly bound by capitalist incentives that their leaders couldn’t delay AI system deployment by at least a few months, and probably more like several years.
        
        This is the crux, thank you for identifying it.
        
        Yeah, I’m fairly pessimistic for several years time, since I don’t think they’re that special of a company in resisting capitalist nudges and incentives.
        
        And yeah I’m laughing because unless the alignment/safety teams control what capabilities are added, then I do not expect the capabilities teams to stop, because they won’t get paid for that.