Looking forward to it! (Should rules permit, we’re also happy to discuss privately at an earlier date)
davekasten
A Narrow Path: a plan to deal with AI extinction risk
Has anyone thought about the things that governments are uniquely good at when it comes to evaluating models?
Here are at least 3 things I think they have as benefits:
1. Just an independent 3rd-party perspective generally2. The ability to draw insights across multiple labs’ efforts, and identify patterns that others might not be able to
3. The ability to draw on classified threat intelligence to inform its research (e.g., Country X is using model Y for bad behavior Z) and to test the model for classified capabilities (bright line example: “can you design an accurate classified nuclear explosive lensing arrangement”).
Are there others that come to mind?
I think this can be true, but I don’t think it needs to be true:
“I expect that a lot of regulation about what you can and can’t do stops being enforceable once the development is happening in the context of the government performing it.”
I suspect that if the government is running the at-all-costs-top-national-priority Project, you will see some regulations stop being enforceable. However, we also live in a world where you can easily find many instances of government officials complaining in their memoirs that laws and regulations prevented them from being able to go as fast or as freely as they’d want on top-priority national security issues. (For example, DoD officials even after 9-11 famously complained that “the lawyers” restricted them too much on top-priority counterterrorism stuff.)
Gentlemen, it’s been a pleasure playing with you tonight
I suspect this won’t get published until November at the earliest, but I am already delightfully pleased with this bit:
Canada geese fly overhead, honking. Your inner northeast Ohioan notices that you are confused; it’s the wrong season for them to migrate this far south, and they’re flying westwards, anyways.A quick Google discovers that some Canada geese have now established themselves non-migratorily in the Bay Area:
“The Migratory Bird Treaty Act of 1918 banned hunting or the taking of eggs without a permit. These protections, combined with an increase in desirable real estate—parks, golf course and the like—spurred a dramatic turnaround for the species. Canada geese began breeding in the Bay Area—the southern end of their range – in the late 1950s.”
You nod, approvingly; this clearly is another part of the East Bay’s well-known, long-term philanthropic commitment to mitigating Acher-Risks.
Preregistering intent to write “Every Bay Area Walled Compound” (hat tip: Emma Liddell)
Unrelatedly, I am in Berkeley through Wednesday afternoon, let me know if you’re around and would like to chat
Generally, it is difficult to understate how completely the PRC is seen as a bad-faith actor in DC these days. Many folks saw them engage in mass economic espionage for a decade while repeatedly promising to stop; those folks are now more senior in their careers than those formative moments. Then COVID happened, and while not everyone believes in the lab leak hypothesis, basically everyone believes that the PRC sure as heck reflexively covered up whether or not they were actually culpable.
(Edit: to be clear, reporting, not endorsing, these claims)
Basic question because I haven’t thought about this deeply: in national security stuff, we often intentionally elide the difference between capabilities and intentions. The logic is: you can’t assume a capability won’t be used, so you should plan as-if it is intended to be used.
Should we adopt such a rule for AGI with regards to policy decision-making? (My guess is...probably not for threat assessment but probably yes for contingency planning?)
I think, having been raised in a series of very debate- and seminar-centric discussion cultures, that a quick-hit question like that is indeed contributing something of substance. I think it’s fair that folks disagree, and I think it’s also fair that people signal (e.g., with karma) that they think “hey man, let’s go a little less Socratic in our inquiry mode here.”
But, put in more rationalist-centric terms, sometimes the most useful Bayesian update you can offer someone else is, “I do not think everyone is having the same reaction to your argument that you expected.” (Also true for others doing that to me!)
(Edit to add two words to avoid ambiguity in meaning of my last sentence)
Yes, I would agree that if I expected a short take to have this degree of attention, I would probably have written a longer comment.
Well, no, I take that back. I probably wouldn’t have written anything at all. To some, that might be a feature; to me, that’s a bug.
It is genuinely a sign that we are all very bad at predicting others’ minds that it didn’t occur to me that if I said effectively “OP asked for ‘takes’, here’s a take on why I think this is pragmatically a bad idea” would also mean that I was saying “and therefore there is no other good question here”. That’s, as the meme goes, a whole different sentence.
I think it’s bad for discourse for us to pretend that discourse doesn’t have impacts on others in a democratic society. And I think the meta-censoring of discourse by claiming that certain questions might have implicit censorship impacts is one of the most anti-rationality trends in the rationalist sphere.
I recognize most users of this platform will likely disagree, and predict negative agreement-karma on this post.
Ok, then to ask it again in your preferred question format: is this where we think our getting-potential-employees-of-Anthropic-to-consider-the-value-of-working-on-safety-at-Anthropic points are best spent?
Is this where we think our pressuring-Anthropic points are best spent ?
I personally endorse this as an example of us being a community that Has The Will To Try To Build Nice Things.
To say the obvious thing: I think if Anthropic isn’t able to make at least somewhat-roughly-meaningful predictions about AI welfare, then their core current public research agendas have failed?
Fair enough!
Possibly misguided question given the context—I see you incorporating imperfect information in “the attack fails silently”, why not also a distinction between “the attack succeeds noisily, the AI wins and we know it won” and “the attack succeeds silently, the AI wins and we don’t know it won” ?
I once saw a video on Instagram of a psychiatrist recommending to other psychiatrists that they purchase ear scopes to check out their patients’ ears, because:
1. Apparently it is very common for folks with severe mental health issues to imagine that there is something in their ear (e.g., a bug, a listening device)
2. Doctors usually just say “you are wrong, there’s nothing in your ear” without looking
3. This destroys trust, so he started doing cursory checks with an ear scope
4. Far more often than he expected (I forget exactly, but something like 10-20%ish), there actually was something in the person’s ear—usually just earwax buildup, but occasionally something else like a dead insect—that was indeed causing the sensation, and he gained a clinical pathway to addressing his patients’ discomfort that he had previously lacked