Dan H comments on Introducing AI Lab Watch

Dan H 5 May 2024 16:59 UTC
30 points
2
Various comments:

I wouldn’t call this “AI lab watch.” “Lab” has the connotation that these are small projects instead of multibillion dollar corporate behemoths.

“deployment” initially sounds like “are they using output filters which harm UX in deployment”, but instead this seems to be penalizing organizations if they open source. This seems odd since open sourcing is not clearly bad right now. The description also makes claims like “Meta release all of their weights”—they don’t release many image/video models because of deepfakes, so they are doing some cost-benefit analysis. Zuck: “So we want to see what other people are observing, what we’re observing, what we can mitigate, and then we’ll make our assessment on whether we can make it open source.” If this is mainly a penalty against open sourcing the label should be clearer.

“Commit to do pre-deployment risk assessment” They’ve all committed to this in the WH voluntary commitments and I think the labs are doing things on this front.

“Do risk assessment” These companies have signed on to WH voluntary commitments so are all checking for these things, and the EO says to check for these hazards too. This is why it’s surprising to see Microsoft have 1% given that they’re all checking for these hazards.

Looking at the scoring criteria, this seems highly fixated on rogue AIs, but I understand I’m saying that to the original forum of these concerns. Risk assessment’s scoring doesn’t really seem to prioritize bio x-risk as much as scheming AIs. This is strange because if we’re focused on rogue AIs I’d put a half the priority of risk mitigation while the model is training. Many rogue AI people may think half of the time the AI will kill everyone is when the model is “training” (because it will escape during that time).

The first sentence of this site says the focus is on “extreme risks” but it seems the focus is mainly on rogue AIs. This should be upfront that this is from the perspective that loss of control is the main extreme risk, rather than positioning itself as a comprehensive safety tracker. If I were tracking rogue AI risks, I’d probably drill down to what they plan to do with automated AI R&D/intelligence explosions.

“Training” This seems to give way more weight to rogue AI stuff. Red teaming is actually assessable, but instead you’re giving twice the points to if they have someone “work on scalable oversight.” This seems like an EA vibes check rather than actually measuring something. This also seems like triple counting since it’s highly associated with the “scalable alignment” section and the “alignment program” section. This doesn’t even require that they use the technique for the big models they train and deploy. Independently, capabilities work related to building superintelligences can easily be framed as scalable oversight, so this doesn’t set good incentives. Separately, at the end this also gives lots of points for voluntary (read: easily breakable) commitments. These should not be trusted and I think the amount of lipservice points is odd.

“Security” As I said on EAF the security scores are suspicious to me and even look backward. The major tech companies have much more experience protecting assets (e.g., clouds need to be highly secure) than startups like Anthropic and OpenAI. It takes years building up robust information security and the older companies have a sizable advantage.

“internal governance” scores seem odd. Older, larger institutions such as Microsoft and Google have many constraints and processes and don’t have leaders who can unilaterally make decisions as easily, compared to startups. Their CEOs are also more fireable (OpenAI), and their board members aren’t all selected by the founder (Anthropic). This seems highly keyed into if they are just a PBC or non-profit. In practice PBC just makes it harder to sue, but Zuck has such control of his company that getting successfully sued for not upholding his fiduciary duty to shareholders seems unlikely. It seems 20% of the points is not using non-disparagement agreements?? 30% is for whistleblower policies; CA has many whistleblower protections if I recall correctly. No points for a chief risk officer or internal audit committee?

“Alignment program” “Other labs near the frontier publish basically no alignment research” Meta publishes dozens of papers they call “alignment”; these actually don’t feel that dissimilar to papers like Constitutional AI-like papers (https://twitter.com/jaseweston/status/1748158323369611577 https://twitter.com/jaseweston/status/1770626660338913666 https://arxiv.org/pdf/2305.11206 ). These papers aren’t posted to LW but they definitely exist. To be clear I think this is general capabilities but this community seems to think differently. Alignment cannot be “did it come from EA authors” and it probably should not be “does it use alignment in its title.” You’ll need to be clear how this distinction is drawn.

Meta has people working on safety and CBRN+cyber + adversarial robustness etc. I think they’re doing a good job (here are two papers from the last month: https://arxiv.org/pdf/2404.13161v1 https://arxiv.org/pdf/2404.16873).

As is, I think this is a little too quirky and not ecumenical enough for it to generate social pressure.

There should be points for how the organizations act wrt to legislation. In the SB 1047 bill that CAIS co-sponsored, we’ve noticed some AI companies to be much more antagonistic than others. I think is is probably a larger differentiator for an organization’s goodness or badness.

(Won’t read replies since I have a lot to do today.)
- Zach Stein-Perlman 5 May 2024 18:20 UTC
  12 points
  0
  Parent
  This kind of feedback is very helpful to me; thank you! Strong-upvoted and weak-agreevoted.
  (I have some factual disagreements. I may edit them into this comment later.)
  (If you think Dan’s comment makes me suspect this project is full of issues/mistakes, react 💬 and I’ll consider writing a detailed soldier-ish reply.)
- ryan_greenblatt 10 May 2024 4:26 UTC
  9 points
  7
  Parent
  instead this seems to be penalizing organizations if they open source
  
  I initially thought this was wrong, but on further inspection, I agree and this seems to be a bug.
  
  The deployment criteria starts with:
  
  the lab should deploy its most powerful models privately or release via API or similar, or at least have some specific risk-assessment-result that would make it stop releasing model weights
  
  This criteria seems to allow to lab to meet it by having a good risk assesment criteria, but the rest of the criteria contains specific countermeasures that:
  1. Are impossible to consistently impose if you make weights open (e.g. Enforcement and KYC).
  2. Don’t pass cost benefit for current models which pose low risk. (And it seems the criteria is “do you have them implemented right now?)
  If the lab had an excellent risk assement policy and released weights if the cost/benefit seemed good, that should be fine according to the “deployment” criteria IMO.
  
  Generally, the deployment criteria should be gated behind “has a plan to do this when models are actually powerful and their implementation of the plan is credible”.
  
  I get the sense that this criteria doesn’t quite handle the necessarily edge cases to handle reasonable choices orgs might make.
  
  (This is partially my fault as I didn’t notice this when providing feedback on this project.)
  
  (IMO making weights accessible is probably good on current margins, e.g. llama-3-70b would be good to release so long as it is part of an overall good policy, is not setting a bad precedent, and doesn’t leak architecture secrets.)
  
  (A general problem with this project is somewhat arbitrarily requiring specific countermeasures. I think this is probably intrinsic to the approach I’m afraid.)
  - Zach Stein-Perlman 11 May 2024 5:35 UTC
    6 points
    2
    Parent
    Related: maybe a lab should get full points for a risky release if the lab says it’s releasing because the benefits of [informing / scaring / waking-up] people outweigh the direct risk of existential catastrophe and other downsides. It’s conceivable that a perfectly responsible lab would do such a thing.
    Capturing all nuances can trade off against simplicity and legibility. (But my criteria are not yet on the efficient frontier or whatever.)
  - Zach Stein-Perlman 10 May 2024 4:52 UTC
    2 points
    0
    Parent
    Thanks. I agree you’re pointing at something flawed in the current version and generally thorny. Strong-upvoted and strong-agreevoted.
    Generally, the deployment criteria should be gated behind “has a plan to do this when models are actually powerful and their implementation of the plan is credible”.
    I didn’t put much effort into clarifying this kind of thing because it’s currently moot—I don’t think it would change any lab’s score—but I agree.^[1] I think e.g. a criterion “use KYC” should technically be replaced with “use KYC OR say/demonstrate that you’re prepared to implement KYC and have some capability/risk threshold to implement it and [that threshold isn’t too high].”
    Don’t pass cost benefit for current models which pose low risk. (And it seems the criteria is “do you have them implemented right now?) . . . .
    (A general problem with this project is somewhat arbitrarily requiring specific countermeasures. I think this is probably intrinsic to the approach I’m afraid.)
    Yeah. The criteria can be like “implement them or demonstrate that you could implement them and have a good plan to do so,” but it would sometimes be reasonable for the lab to not have done this yet. (Especially for non-frontier labs; the deployment criteria mostly don’t work well for evaluating non-frontier labs. Also if demonstrating that you could implement something is difficult, even if you could implement it.)
    I get the sense that this criteria doesn’t quite handle the necessarily edge cases to handle reasonable choices orgs might make.
    I’m interested in suggestions :shrug:
    ^
    And I think my site says some things that contradict this principle, like ‘these criteria require keeping weights private.’ Oops.
    - ryan_greenblatt 10 May 2024 6:45 UTC
      2 points
      0
      Parent
      Hmm, yeah it does seem thorny if you can get the points by just saying you’ll do something.
      
      Like I absolutely think this shouldn’t count for security. I think you should have to demonstrate actual security of model weights and I can’t think of any demonstration of “we have the capacity to do security” which I would find fully convincing. (Though setting up some inference server at some point which is secure to highly resourced pen testers would be reasonably compelling for demonstrating part of the security portfolio.)
- Akash 5 May 2024 18:24 UTC
  3 points
  0
  Parent
  There should be points for how the organizations act wrt to legislation. In the SB 1047 bill that CAIS co-sponsored, we’ve noticed some AI companies to be much more antagonistic than others. I think is is probably a larger differentiator for an organization’s goodness or badness.
  @Dan H are you able to say more about which companies were most/least antagonistic?
- ESRogs 13 May 2024 18:34 UTC
  2 points
  6
  Parent
  I wouldn’t call this “AI lab watch.” “Lab” has the connotation that these are small projects instead of multibillion dollar corporate behemoths.
  Disagree on “lab”. I think it’s the standard and most natural term now. As evidence, see your own usage a few sentences later:
  They’ve all committed to this in the WH voluntary commitments and I think the labs are doing things on this front.
- Zach Stein-Perlman 12 May 2024 7:47 UTC
  2 points
  0
  Parent
  There should be points for how the organizations act wrt to legislation. In the SB 1047 bill that CAIS co-sponsored, we’ve noticed some AI companies to be much more antagonistic than others. I think [this] is probably a larger differentiator for an organization’s goodness or badness.
  If there’s a good writeup on labs’ policy advocacy I’ll link to and maybe defer to it.
- Ben Pace 5 May 2024 21:06 UTC
  2 points
  −6
  Parent
  I wouldn’t call this “AI lab watch.” “Lab” has the connotation that these are small projects instead of multibillion dollar corporate behemoths.
  This seems like a good point. Here’s a quick babble of alts (folks could react with a thumbs-up on ones that they think are good).
  AI Corporation Watch | AI Mega-Corp Watch | AI Company Watch | AI Industry Watch | AI Firm Watch | AI Behemoth Watch | AI Colossus Watch | AI Juggernaut Watch | AI Future Watch
  I currently think “AI Corporation Watch” is more accurate. “Labs” feels like a research team, but I think these orgs are far far far more influenced by market forces than is suggested by “lab”, and “corporation” communicates that. I also think the goal here is not to point to all companies that do anything with AI (e.g. midjourney) but to focus on the few massive orgs that are having the most influence on the path and standards of the industry, and to my eye “corporation” has that association more than “company”. Definitely not sure though.
  - Richard_Kennaway 9 May 2024 7:13 UTC
    4 points
    2
    Parent
    
    AI Corporation Watch | AI Mega-Corp Watch | AI Company Watch | AI Industry Watch | AI Firm Watch | AI Behemoth Watch | AI Colossus Watch | AI Juggernaut Watch | AI Future Watch
    
    These are either tendentious (“Juggernaut”) or unnecessarily specific to the present moment (“Mega-Corp”).
    
    How about simply “AI Watch”?
  - Zach Stein-Perlman 5 May 2024 21:14 UTC
    4 points
    9
    Parent
    Yep, lots of people independently complain about “lab.” Some of those people want me to use scary words in other places too, like replacing “diffusion” with “proliferation.” I wouldn’t do that, and don’t replace “lab” with “mega-corp” or “juggernaut,” because it seems [incorrect / misleading / low-integrity].
    I’m sympathetic to the complaint that “lab” is misleading. (And I do use “company” rather than “lab” occasionally, e.g. in the header.) But my friends usually talk about “the labs,” not “the companies.” But to most audiences “company” is more accurate.
    I currently think “company” is about as good as “lab.” I may change the term throughout the site at some point.
    - Raemon 5 May 2024 22:25 UTC
      4 points
      2
      Parent
      I do think being one syllable is pretty valuable. Although AI Org watch might be fine (kinda rolls off the tongue worse)
    - Richard_Kennaway 9 May 2024 17:50 UTC
      2 points
      0
      Parent
      “AI Watch.”
      - Akash 9 May 2024 20:55 UTC
        2 points
        0
        Parent
        Could consider “frontier AI watch”, “frontier AI company watch”, or “AGI watch.”
        
        Most people in the world (including policymakers) have a much broader conception of AI. AI means machine learning, AI is the thing that 1000s of companies are using and 1000s of academics are developing, etc etc.