Eli Tyre comments on Integrity in AI Governance and Advocacy

Eli Tyre 1 Dec 2023 1:03 UTC
12 points
2
- I think that “doomers” were far too pessimistic about governance before ChatGPT (in ways that I and others predicted beforehand, e.g. in discussions with Ben and Eliezer). I think they should update harder from this mistake than they’re currently doing (e.g. updating that they’re too biased towards inside-view models and/or fast takeoff and/or high P(doom)).
I think it remains to be seen what the right level of pessimism was. It still seems pretty likely that we’ll see not just useless, but actively catastrophically counter-productive interventions from governments in the next handful of years.

But you’re absolutely right that I was generally pessimistic about policy interventions from 2018ish through to 2021 or so.

My main objection was that I wasn’t aware of any policies that seemed like they helped and I was unenthusiastic about the way that EAs seemed to be optimistic about getting into positions of power without (seeming to me) to be very clear-to-themselves that they didn’t have policy ideas to implement.

I felt better about people going into policy to the extent that those people had clarity for themselves, “I don’t know what to recommend if I have power. I’m trying to execute one part of a two part plan that involves getting power and then using that to advocate for x-risk mitigating policies. I’m intentionally punting that question to my future self / hoping that other EAs thinking full time about this come up with good ideas.” I think I still basically stand by this take. ^[1]

My main update is it turns out that the basic idea of this post was false. There were developments that were more alarming than “this is business as usual” to a good number of people and that really changed the landscape.

One procedural update that I’ve made from that and similar mistakes is just “I shouldn’t put as much trust in Eliezer’s rhetoric about how the world works, when it isn’t backed up by clearly articulated models. I should treat those ideas a plausible hypotheses, and mostly be much more attentive to evidence that I can see directly.”
1. ^
  Also, I think that this is one instance of the general EA failure mode of pursuing a plan which entails accruing more resources for EA (community building to bring in more people, marketing to bring in more money, politics to acquire power), without a clear personal inside view of what to do with those resources, effectively putting a ton of trust in the EA network to reach correct conclusions about which things help.
  
  There are a bunch of people trusting the EA machine to 1) aim for good things and 2) have good epistemics. They trust it so much they’ll go campaign for a guy running for political office without knowing much about him, except that he’s an EA. Or they route their plan for positive impact on the world through positively impacting EA itself (“I want to do mental health coaching for EAs” or “I want to build tools for EAs” or going to do ops for this AI org, which 80k recommended (despite not knowing much about what they do).)
  
  This is pretty scary, because it seems like a some of those people were not worthy of trust (SBF in particular, won a huge amount of veneration).
  
  And even in the cases the people who are, I believe, earnest geniuses, it is still pretty dangerous to mostly be deferring to them. Paul put a good deal of thought into the impacts of developing RLHF, and he thinks the overall impacts are positive. But that Paul is smart and good does not make it a foregone conclusion that his work is good not net. That’s a really hard question to answer, about which I think most people should be pretty circumspect.
  
  It seems to me that there is an army of earnest young people who want to do the most good that they can. They’ve been told (and believe) the AI risk is the most important problem, but it’s a confusing problem depending on technical expertise, famously fraught problems of forecasting the character of not-yet-existent technologies, and a bunch of weird philosophy. The vast majority of those young people don’t know how to make progress on the core problems of AI risk directly, or even necessarily identify which work is making progress. But they still want to help, so they commit themselves to eg community building, getting more people to join, everyone taking social cues from the few people that seem to have personal traction on the problem about what kinds of object level things are good to do.
  
  This seems concerning to me. This kind of structure where a bunch of smart young people are building a pile of resources to be controlled mostly by deference to a status hierarchy, where you figure out which thinkers are cool by picking up on the social cues of who is regarded as cool, rather than evaluating their work for yourself...well, it’s not so much that I expect it to be coopted, but I just don’t expect that overall agglomerated machine to be particularly steered towards the good, whatever values it professes.
  
  It doesn’t have a structure that binds it particularly tightly to what’s true. Better than most non-profit communities, worse than many for-profit companies, probably.
  
  It seems more concerning to the extent that many of the object level actions to which the EAs are funneling resources are not just useless, but actively bad. It turns out that being smart enough, as a community, to identify the most important problem in the world, but not smart enough to systematically know how to positively impact that problem is pretty dangerous.
  
  eg the core impacts of people trying to impact x-risk so far includ
  - (Maybe? Partially?) causing Deepmind to exist
  - (Definitely) causing OpenAI to exist
  - (Definitely) causing Anthropic to exist
  - Inventing RLHF and accelerating the development of RLHF’d language models
  It’s pretty unclear to me what the sign of these interventions are. They seem bad on the face of it, but as I’ve watched things develop I’m not as sure. It depends on pretty complicated questions about second and third order effects, and counterfactuals.
  
  But it seems bad to have an army of earnest young people who, in the name of their do-gooding ideology, shovel resources at the decentralized machine doing these maybe good maybe bad activities, because they’re picking up on social cues of who to defer to and what those people think! That doesn’t seem very high EV for the world!
  
  (To be clear, I was one of the army of earnest young people. I spent a number of years helping recruit for a secret research program—I didn’t even have the most basic information, much less the expertise to assess if it was any good—because I was taking my cues from Anna, who was taking her cues from Eliezer.
  
  I did that out of a combination of 1) having read Eliezer’s philosophy, and having enough philosophical grounding to be really impressed by it, and 2) being ready and willing to buy into a heroic narrative to save the world, which these people were (earnestly) offering me.)
  
  And, procedurally, all this is made somewhat more perverse, by the fact that that this community, this movement, was branded as the “carefully think through our do gooding” movement. We raised the flag of “let’s do careful research and cost benefit analysis to guide our charity”, but over time this collapsed into a deferral network, with ideas about what’s good to do driven mostly by the status hierarchy. Cruel irony.