habryka comments on Stephen Fowler’s Shortform

habryka 18 May 2024 15:29 UTC
35 points
7
I would be happy to defend roughly the position above (I don’t agree with all of it, but agree with roughly something like “the strategy of trying to play the inside game at labs was really bad, failed in predictable ways, and has deeply eroded trust in community leadership due to the adversarial dynamics present in such a strategy and many people involved should be let go”).

I do think most people who disagree with me here are under substantial confidentiality obligations and de-facto non-disparagement obligations (such as really not wanting to imply anything bad about Anthropic or wanting to maintain a cultivated image for policy purposes) so that it will be hard to find a good public debate partner, but it isn’t impossible.
- owencb 19 May 2024 20:49 UTC
  14 points
  3
  Parent
  I largely disagree (even now I think having tried to play the inside game at labs looks pretty good, although I have sometimes disagreed with particular decisions in that direction because of opportunity costs). I’d be happy to debate if you’d find it productive (although I’m not sure whether I’m disagreeable enough to be a good choice).
- Lukas_Gloor 18 May 2024 16:01 UTC
  10 points
  7
  Parent
  For me, the key question in situations when leaders made a decision with really bad consequences is, “How did they engage with criticism and opposing views?”
  
  If they did well on this front, then I don’t think it’s at all mandatory to push for leadership changes (though certainly, the worse someones track record gets, the more that speaks against them).
  
  By contrast, if leaders tried to make the opposition look stupid or if they otherwise used their influence to dampen the reach of opposing views, then being wrong later is unacceptable.
  Basically, I want to allow for a situation where someone was like, “this is a tough call and I can see reasons why others wouldn’t agree with me, but I think we should do this,” and then ends up being wrong, but I don’t want to allow situations where someone is wrong after having expressed something more like, “listen to me, I know better than you, go away.”
  In the first situation, it might still be warranted to push for leadership changes (esp. if there’s actually a better alternative), but I don’t see it as mandatory.
  The author of the original short form says we need to hold leaders accountable for bad decisions because otherwise the incentives are wrong. I agree with that, but I think it’s being too crude to tie incentives to whether a decision looks right or wrong in hindsight. We can do better and evaluate how someone went about making a decision and how they handled opposing views. (Basically, if opposing views aren’t loud enough that you’d have to actively squish them using your influence illegitimately, then the mistake isn’t just yours as the leader; it’s also that the situation wasn’t significantly obvious to others around you.) I expect that everyone who has strong opinions on things and is ambitious and agenty in a leadership position is going to make some costly mistakes. The incentives shouldn’t be such that leaders shy away from consequential interventions.
- Pablo 18 May 2024 19:45 UTC
  5 points
  3
  Parent
  If the strategy failed in predictable ways, shouldn’t we expect to find “pre-registered” predictions that it would fail?
  - habryka 18 May 2024 20:04 UTC
    17 points
    6
    Parent
    I have indeed been publicly advocating against the inside game strategy at labs for many years (going all the way back to 2018), predicting it would fail due to incentive issues and have large negative externalities due to conflict of interest issues. I could dig up my comments, but I am confident almost anyone who I’ve interfaced with at the labs, or who I’ve talked to about any adjacent topic in leadership would be happy to confirm.
- Ebenezer Dukakis 18 May 2024 20:30 UTC
  3 points
  0
  Parent
  
  adversarial dynamics present in such a strategy
  
  Are you just referring to the profit incentive conflicting with the need for safety, or something else?
  
  I’m struggling to see how we get aligned AI without “inside game at labs” in some way, shape, or form.
  
  My sense is that evaporative cooling is the biggest thing which went wrong at OpenAI. So I feel OK about e.g. Anthropic if it’s not showing signs of evaporative cooling.