I think the thing that you’re not considering is that when tunnels are more prevalent and more densely packed, the incentives to use the defensive strategy of “dig a tunnel, then set off a very big bomb in it that collapses many tunnels” gets far higher. It wouldn’t always be infantry combat, it would often be a subterranean equivalent of indirect fires.
davekasten
Ok, so Anthropic’s new policy post (explicitly NOT linkposting it properly since I assume @Zac Hatfield-Dodds or @Evan Hubinger or someone else from Anthropic will, and figure the main convo should happen there, and don’t want to incentivize fragmenting of conversation) seems to have a very obvious implication.
Unrelated, I just slammed a big AGI-by-2028 order on Manifold Markets.
Yup. The fact that the profession that writes the news sees “I should resign in protest” as their own responsibility in this circumstance really reveals something.
At LessOnline, there was a big discussion one night around the picnic tables with @Eliezer_Yudkovsky , @habryka , and some interlocutors from the frontier labs (you’ll momentarily see why I’m being vague on the latter names).
One question was: “does DC actually listen to whistleblowers?” and I contributed that, in fact, DC does indeed have a script for this, and resigning in protest is a key part of it, especially ever since the Nixon years.
Here is a usefully publicly-shareable anecdote on how strongly this norm is embedded in national security decision-making, from the New Yorker article “The U.S. Spies Who Sound the Alarm About Election Interference” by David Kirkpatrick, Oct 21, 2024:
(https://archive.ph/8Nkx5)The experts’ chair insisted that in this cycle the intelligence agencies had not withheld information “that met all five of the criteria”—and did not risk exposing sources and methods. Nor had the leaders’ group ever overruled a recommendation by the career experts. And if they did? It would be the job of the chair of the experts’ group to stand up or speak out, she told me: “That is why we pick a career civil servant who is retirement-eligible.” In other words, she can resign in protest.
Does “highest status” here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc?
I mean, functionally all of those things. (Well, minus the country dynamic. Everyone at this event I talked to was US, UK, or Canadian, which is all sorta one team for purposes of status dynamics at that event)
I was being intentionally broad, here. I am probably less interested for purposes of this particular post only in the question of “who controls the future” swerves and more about “what else would interested, agentic actors do” questions.
It is not at all clear to me that OpenPhil is the only org who feels this way—I can think of several non-EA-ish charities that if they genuinely 100% believed “none of the people you care for will die of the evils you fight if you can just keep them alive for the next 90 days” would plausibly do some interestingly agentic stuff.
Oh, to be clear I’m not sure this is at all actually likely, but I was curious if anyone had explored the possibility conditional on it being likely
Basic Q: has anyone written much down about what sorts of endgame strategies you’d see just-before-ASI from the perspective of “it’s about to go well, and we want to maximize the benefits of it” ?
For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we’re about to all be immortal under ASI, and they’re trying to get as many people possible to that future…
yup, as @sanxiyn says, this already exists. Their example is, AIUI, a high-end research one; an actually-on-your-laptop-right-now, but admittedly more narrow example is address space layout randomization.
Wild speculation: they also have a sort of we’re-watching-but-unsure provision about cyber operations capability in their most recent RSP update. In it, they say in part that “it is also possible that by the time these capabilities are reached, there will be evidence that such a standard is not necessary (for example, because of the potential use of similar capabilities for defensive purposes).” Perhaps they’re thinking that automated vulnerability discovery is at least plausibly on-net-defensive-balance-favorable*, and so they aren’t sure it should be regulated as closely, even if in still in some informal sense “dual use” ?
Again, WILD speculation here.
*A claim that is clearly seen as plausible by, e.g., the DARPA AI Grand Challenge effort.
It seems like the current meta is to write a big essay outlining your opinions about AI (see, e.g., Gladstone Report, Situational Awareness, various essays recently by Sam Altman and Dario Amodei, even the A Narrow Path report I co-authored).
Why do we think this is the case?
I can imagine at least 3 hypotheses:
1. Just path-dependence; someone did it, it went well, others imitated2. Essays are High Status Serious Writing, and people want to obtain that trophy for their ideas
3. This is a return to the true original meaning of an essay, under Montaigne, that it’s an attempt to write thinking down when it’s still inchoate, in an effort to make it more comprehensible not only to others but also to oneself. And AGI/ASI is deeply uncertain, so the essay format is particularly suited for this.
What do you think?
Okay, I spent much more time with the Anthropic RSP revisions today. Overall, I think it has two big thematic shifts for me:
1. It’s way more “professionally paranoid,” but needs even more so on non-cyber risks. A good start, but needs more on being able to stop human intelligence (i.e., good old fashioned spies)
2. It really has an aggressively strong vibe of “we are actually using this policy, and We Have Many Line Edits As A Result.” You may not think that RSPs are sufficient—I’m not sure I do, necessarily—but I am heartened slightly that they genuinely seem to take the RSP seriously to the point of having mildly-frustrated-about-process-hiccup footnoes about it. (Free advice to Anthropic PR: interview a bunch of staff about this on camera, cut it together, and post it, it will be lovely and humanizing and great recruitment material, I bet).
It’s a small but positive sign that Anthropic sees taking 3 days beyond their RSP’s specified timeframe to conduct a process without a formal exception as an issue. Signals that at least some members of the team there are extremely attuned to normalization of deviance concerns.
I once saw a video on Instagram of a psychiatrist recommending to other psychiatrists that they purchase ear scopes to check out their patients’ ears, because:
1. Apparently it is very common for folks with severe mental health issues to imagine that there is something in their ear (e.g., a bug, a listening device)
2. Doctors usually just say “you are wrong, there’s nothing in your ear” without looking
3. This destroys trust, so he started doing cursory checks with an ear scope
4. Far more often than he expected (I forget exactly, but something like 10-20%ish), there actually was something in the person’s ear—usually just earwax buildup, but occasionally something else like a dead insect—that was indeed causing the sensation, and he gained a clinical pathway to addressing his patients’ discomfort that he had previously lacked
Looking forward to it! (Should rules permit, we’re also happy to discuss privately at an earlier date)
A Narrow Path: a plan to deal with AI extinction risk
Has anyone thought about the things that governments are uniquely good at when it comes to evaluating models?
Here are at least 3 things I think they have as benefits:
1. Just an independent 3rd-party perspective generally2. The ability to draw insights across multiple labs’ efforts, and identify patterns that others might not be able to
3. The ability to draw on classified threat intelligence to inform its research (e.g., Country X is using model Y for bad behavior Z) and to test the model for classified capabilities (bright line example: “can you design an accurate classified nuclear explosive lensing arrangement”).
Are there others that come to mind?
I think this can be true, but I don’t think it needs to be true:
“I expect that a lot of regulation about what you can and can’t do stops being enforceable once the development is happening in the context of the government performing it.”
I suspect that if the government is running the at-all-costs-top-national-priority Project, you will see some regulations stop being enforceable. However, we also live in a world where you can easily find many instances of government officials complaining in their memoirs that laws and regulations prevented them from being able to go as fast or as freely as they’d want on top-priority national security issues. (For example, DoD officials even after 9-11 famously complained that “the lawyers” restricted them too much on top-priority counterterrorism stuff.)
Gentlemen, it’s been a pleasure playing with you tonight
I think people opposing this have a belief that the counterfactual is “USG doesn’t have LLMs” instead of “USG spins up its own LLM development effort using the NSA’s no-doubt-substantial GPU clusters”.
Needless to say, I think the latter is far more likely.