davekasten comments on evhub’s Shortform

davekasten Dec 28, 2024, 8:51 PM
39 points
17
Opportunities that I’m pretty sure are good moves for Anthropic generally:
1. Open an office literally in Washington, DC, that does the same work that any other Anthropic office does (i.e., NOT purely focused on policy/lobbying, though I’m sure you’d have some folks there who do that). If you think you’re plausibly going to need to convince policymakers on critical safety issues, having nonzero numbers of your staff that are definitively not lobbyists being drinking or climbing gym buddies that get called on the “My boss needs an opinion on this bill amendment by tomorrow, what do you think” roster is much more important than your org currently seems to think!
2. Expand on recent efforts to put more employees (and external collaborators on research) in front of cameras as the “face” of that research—you folks frankly tend to talk in ways that tend to be compatible with national security policymakers’ vibes. (E.G., Evan and @Zac Hatfield-Dodds both have a flavor of the playful gallows humor that pervades that world). I know I’m a broken record on this but I do think it would help.
3. Do more to show how the RSP affects its daily work (unlike many on this forum, I currently believe that they are actually Trying to Use The Policy and had many line edits as a result of wrestling with v1.0′s minor infelicities). I understand that it is very hard to explain specific scenarios of how it’s impacted day-to-day work without leaking sensitive IP or pointing people in the direction of potentially-dangerous things. Nonetheless, I think Anthropic needs to try harder here. It’s, like...it’s like trying to understand DoD if they only ever talked about the “warfighter” in the most abstract terms and never, like, let journalists embed with a patrol on the street in Kabul or Baghdad.
4. Invest more in DC policymaker education outside of the natsec/defense worlds you’re engaging already—I can’t emphasize enough how many folks in broad DC think that AI is just still a scam or a fad or just “trying to destroy art”. On the other hand, people really have trouble believing that an AI could be “as creative as” a human—the sort of Star Trek-ish “Kirk can always outsmart the machine” mindset pervades pretty broadly. You want to incept policymaking elites more broadly so that they are ready as this scales up.
Opportunities that I feel less certain about, but in the spirit of brainstorming:
1. Develop more proactive, outward-facing detection capabilities to see if there are bad AI models out there. I don’t mean red-teaming others’ models, or evals, or that sort of thing. I mean, think about how you would detect if Anthropic had bad (misaligned or aligned-but-being-used-for-very-impactful-bad-things) models out there if you were at an intelligence agency without official access to Anthropic’s models and then deploy those capabilities against Anthropic, and the world broadly.^[1] You might argue that this is sort of an inverted version of @Buck’s control agenda—instead of trying to make it difficult for a model to escape, think about what facts about the world are likely to be true if a model has escaped, and then go looking for those.
2. If it’s not already happening, have Dario and other senior Anthropic leaders meet with folks who had to balance counterintelligence paranoia with operational excellence (e.g., leaders of intelligence agencies, for whom the standard advice to their successor is, “before you go home every day, ask ‘where’s the spy^[2]’”) so that they have a mindset on how to scale up his paranoia over time as needed
3. Something something use cases—Use case-based-restrictions are popular in some policy spheres. Some sort of research demonstrating that a model that’s designed for and safe for use case X can easily be turned into a misaligned tool for use case Y under a plausible usage scenario might be useful?
Reminder/disclosure: as someone who works in AI policy, there are worlds where some of these ideas help my self-interest; others harm it. I’m not going to try to do the math on which are which under all sorts of complicated double-bankshot scenarios, though.
1. ^
  To the extent consistent with law, obviously. Don’t commit crimes.
2. ^
  That is, the spy that’s paid for by another country and spying on you. Not your own spies.