I mean, the literal best way to incentivize @Ricki Heicklen and me to do this again for LessOnline and Manifest 2025 is to create a prediction market on it, so I encourage you to do that
davekasten
[Cross-post] Every Bay Area “Walled Compound”
One point that maybe someone’s made, but I haven’t run across recently: if you want to turn AI development into a Manhattan Project, you will by-default face some real delays from the reorganization of private efforts into one big national effort. In a close race, you might actually see pressures not to do so, because you don’t want to give up 6 months to a year on reorg drama—so in some possible worlds, the Project is actually a deceleration move in the short term, even if it accelerates in the long term!
Ooh, interesting, thank you!
[Cross-post] Welcome to the Essay Meta
Incidentally, spurred by @Mo Putera’s posting of Vernor Vinge’s A Fire Upon The Deep annotations, I want to remind folks that Vinge’s Rainbows End is very good and doesn’t get enough attention, and will give you a less-incorrect understanding of how national security people think.
Oh, fair enough then, I trust your visibility into this. Nonetheless one Should Can Just Report Bugs
Note for posterity that there has been at least $15K of donations since this got turned back on—You Can Just Report Bugs
Ok, but you should leave the donation box up—link now seems to not work? I bet there would be at least several $K USD of donations from folks who didn’t remember to do it in time.
I think you’re missing at least one strategy here. If we can get folks to agree that different societies can choose different combos, so long as they don’t infringe on some subset of rights to protect other societies, then you could have different societies expand out into various pieces of the future in different ways. (Yes, I understand that’s a big if, but it reduces the urgency/crux nature of value agreement).
Note that the production function of the 10x really matters. If it’s “yeah, we get to net-10x if we have all our staff working alongside it,” it’s much more detectable than, “well, if we only let like 5 carefully-vetted staff in a SCIF know about it, we only get to 8.5x speedup”.
(It’s hard to prove that the results are from the speedup instead of just, like, “One day, Dario woke up from a dream with The Next Architecture in his head”)
Basic clarifying question: does this imply under-the-hood some sort of diminishing returns curve, such that the lab pays for that labor until it net reaches as 10x faster improvement, but can’t squeeze out much more?
And do you expect that’s a roughly consistent multiplicative factor, independent of lab size? (I mean, I’m not sure lab size actually matters that much, to be fair, it seems that Anthropic keeps pace with OpenAI despite being smaller-ish)
For the record: signed up for a monthly donation starting in Jan 2025. It’s smaller than I’d like given some financial conservatism until I fill out my taxes, may revisit it later.
Everyone who’s telling you there aren’t spoilers in here is well-meaning, but wrong. But to justify why I’m saying that is also spoilery, so to some degree you have to take this on faith.
(Rot13′d for those curious about my justification: Bar bs gur znwbe cbvagf bs gur jubyr svp vf gung crbcyr pna, vs fhssvpvragyl zbgvingrq, vasre sne zber sebz n srj vfbyngrq ovgf bs vasbezngvba guna lbh jbhyq anviryl cerqvpg. Vs lbh ner gryyvat Ryv gung gurfr ner abg fcbvyref V cbyvgryl fhttrfg gung V cerqvpg Nfzbqvn naq Xbein naq Pnevffn jbhyq fnl lbh ner jebat.)
Opportunities that I’m pretty sure are good moves for Anthropic generally:
Open an office literally in Washington, DC, that does the same work that any other Anthropic office does (i.e., NOT purely focused on policy/lobbying, though I’m sure you’d have some folks there who do that). If you think you’re plausibly going to need to convince policymakers on critical safety issues, having nonzero numbers of your staff that are definitively not lobbyists being drinking or climbing gym buddies that get called on the “My boss needs an opinion on this bill amendment by tomorrow, what do you think” roster is much more important than your org currently seems to think!
Expand on recent efforts to put more employees (and external collaborators on research) in front of cameras as the “face” of that research—you folks frankly tend to talk in ways that tend to be compatible with national security policymakers’ vibes. (E.G., Evan and @Zac Hatfield-Dodds both have a flavor of the playful gallows humor that pervades that world). I know I’m a broken record on this but I do think it would help.
Do more to show how the RSP affects its daily work (unlike many on this forum, I currently believe that they are actually Trying to Use The Policy and had many line edits as a result of wrestling with v1.0′s minor infelicities). I understand that it is very hard to explain specific scenarios of how it’s impacted day-to-day work without leaking sensitive IP or pointing people in the direction of potentially-dangerous things. Nonetheless, I think Anthropic needs to try harder here. It’s, like...it’s like trying to understand DoD if they only ever talked about the “warfighter” in the most abstract terms and never, like, let journalists embed with a patrol on the street in Kabul or Baghdad.
Invest more in DC policymaker education outside of the natsec/defense worlds you’re engaging already—I can’t emphasize enough how many folks in broad DC think that AI is just still a scam or a fad or just “trying to destroy art”. On the other hand, people really have trouble believing that an AI could be “as creative as” a human—the sort of Star Trek-ish “Kirk can always outsmart the machine” mindset pervades pretty broadly. You want to incept policymaking elites more broadly so that they are ready as this scales up.
Opportunities that I feel less certain about, but in the spirit of brainstorming:
Develop more proactive, outward-facing detection capabilities to see if there are bad AI models out there. I don’t mean red-teaming others’ models, or evals, or that sort of thing. I mean, think about how you would detect if Anthropic had bad (misaligned or aligned-but-being-used-for-very-impactful-bad-things) models out there if you were at an intelligence agency without official access to Anthropic’s models and then deploy those capabilities against Anthropic, and the world broadly.[1] You might argue that this is sort of an inverted version of @Buck’s control agenda—instead of trying to make it difficult for a model to escape, think about what facts about the world are likely to be true if a model has escaped, and then go looking for those.
If it’s not already happening, have Dario and other senior Anthropic leaders meet with folks who had to balance counterintelligence paranoia with operational excellence (e.g., leaders of intelligence agencies, for whom the standard advice to their successor is, “before you go home every day, ask ‘where’s the spy[2]’”) so that they have a mindset on how to scale up his paranoia over time as needed
Something something use cases—Use case-based-restrictions are popular in some policy spheres. Some sort of research demonstrating that a model that’s designed for and safe for use case X can easily be turned into a misaligned tool for use case Y under a plausible usage scenario might be useful?
Reminder/disclosure: as someone who works in AI policy, there are worlds where some of these ideas help my self-interest; others harm it. I’m not going to try to do the math on which are which under all sorts of complicated double-bankshot scenarios, though.
FWIW re: the Dario 2025 comment, Anthropic very recently posted a few job openings for recruiters focused on policy and comms specifically, which I assume is a leading indicator for hiring. One plausible rationale there is that someone on the executive team smashed the “we need more people working on this, make it happen” button.
In an ideal world (perhaps not reasonable given your scale), you would have some sort of permissions and logging against some sensitive types of queries on DM metadata. (E.G., perhaps you would let any Lighthaven team member see on the dashboard “rate of DMs from accounts <1 month in age compared to historic baseline” aggregate number, but “how many DMs has Bob (an account over 90 days old) sent to Alice” would require more guardrails.
Edit: to be clear, I am comfortable with you doing this without such logging at your current scale and think it is reasonable to do so.
I have a few weeks off coming up shortly, and I’m planning on spending some of it monkeying around AI and code stuff. I can think of two obvious tacks: 1. Go do some fundamentals learning on technical stuff I don’t have hands-on technical experience with or 2. go build on new fun stuff.
Does anyone have particular lists of learning topics / syllabi / similar things like that that would be a good fit for “fairly familiar with the broad policy/technical space, but his largest shipped chunk of code is a few hundred lines of python” person like me?
Note also that this work isn’t just papers; e.g., as a matter of public record MIRI has submitted formal comments to regulators to inform draft regulation based on this work.
(For those less familiar, yes, such comments are indeed actually weirdly impactful in the American regulatory system).
I am (sincerely!) glad that this is obvious to other people too and that they are talking about it already!