MIRI (soon) , MATS (former), Palisade (sometimes)
background in philosophy
addiction to language, nicotine
maker of geometry-inspired musical tools
MIRI (soon) , MATS (former), Palisade (sometimes)
background in philosophy
addiction to language, nicotine
maker of geometry-inspired musical tools
I agree with this in the world where people are being epistemically rigorous/honest with themselves about their timelines and where there’s a real consensus view on them. I’ve observed that it’s pretty rare for people to make decisions truly grounded in their timelines, or to do so only nominally, and I think there’s a lot of social signaling going on when (especially younger) people state their timelines.
I appreciate that more experienced people are willing to give advice within a particular frame (“if timelines were x”, “if China did y”, “if Anthropic did z”, “If I went back to school”, etc etc), even if they don’t agree with the frame itself. I rely on more experienced people in my life to offer advice of this form (“I’m not sure I agree with your destination, but admit there’s uncertainty, and love and respect you enough to advise you on your path”).
Of course they should voice their disagreement with the frame (and I agree this should happen more for timelines in particular), but to gate direct counsel on urgent, object-level decisions behind the resolution of background disagreements is broadly unhelpful.
When someone says “My timelines are x, what should I do?”, I actually hear like three claims:
Timelines are x
I believe timelines are x
I am interested in behaving as though timelines are x
Evaluation of the first claim is complicated and other people do a better job of it than I do so let’s focus on the others.
“I believe timelines are x” is a pretty easy roll to disbelieve. Under relatively rigorous questioning, nearly everyone (particularly everyone ‘career-advice-seeking age’) will either say they are deferring (meaning they could just as easily defer to someone else tomorrow), or admit that it’s a gut feel, especially for their ~90 percent year, and especially for more and more capable systems (this is more true of ASI than weak AGI, for instance, although those terms are underspecified). Still others will furnish 0 reasoning transparency and thus reveal their motivations to be principally social (possibly a problem unique to the bay, although online e/acc culture has a similar Thing).
“I am interested in behaving as though timelines are x” is an even easier roll to disbelieve. Very few people act on their convictions in sweeping, life-changing ways without concomitant benefits (money, status, power, community), including people within AIS (sorry friends).
With these uncertainties, piled on top of the usual uncertainties surrounding timelines, I’m not sure I’d want anyone to act so nobly as to refuse advice to someone with different timelines.
If Alice is a senior AIS professional who gives advice to undergrads at parties in Berkeley (bless her!), how would her behavior change under your recommendation? It sounds like maybe she would stop fostering a diverse garden of AIS saplings and instead become the awful meme of someone who just wants to fight about a highly speculative topic. Seems like a significant value loss.
Their timelines will change some other day; everyone’s will. In the meantime, being equipped to talk to people with a wide range of safety-concerned views (especially for more senior, or just Older people), seems useful.
harder to converge
Converge for what purpose? It feels like the marketplace of ideas is doing an ok job of fostering a broad portfolio of perspectives. If anything, we are too convergent and, as a consequence, somewhat myopic internally. Leopold mind-wormed a bunch of people until Tegmark spoke up (and that only somewhat helped). Few thought governance was a good idea until pretty recently (~3 years ago), and it would be going better if those interested in the angle weren’t shouted down so emphatically to begin with.
If individual actors need to cross some confidence threshold in order to act, but the reasonable confidence interval is in fact very wide, I’d rather have a bunch of actors with different timelines, which roughly sum to the shape of the reasonable thing*, then have everyone working on the same overconfident assumption that later comes back to bite us (when we’ve made mistakes in the past, this is often why).
*Which is, by the way, closer to flat than most people’s individual timelines
I don’t think I really understood what it meant for establishment politics to be divisive until this past election.
As good as it feels to sit on the left and say “they want you to hate immigrants” or “they want you to hate queer people”, it seems similarly (although probably not equally?) true that the center left also has people they want you to hate (the religious, the rich, the slightly-more-successful-than-you, the ideologically-impure-who-once-said-a-bad-thing-on-the-internet).
But there’s also a deeper, structural sense in which it’s true.
Working on AIS, I’ve long hoped that we could form a coalition with all of the other people worried about AI, because a good deal of them just.. share (some version of) our concerns, and our most ambitious policy solutions (e.g. stopping development, mandating more robust interpretability and evals) could also solve a bunch of problems highlighted by the FATE community, the automation-concerned, etc etc.
Their positions also have the benefit of conforming to widely-held anxieties (‘I am worried AI will just be another tool of empire’, ‘I am worried I will lose my job for banal normie reasons that have nothing to do with civilizational robustness’, ‘I am worried AI’s will cheaply replace human labor and do a worse job, enshittifying everything in the developed world’). We could generally curry popular support and favor, without being dishonest, by looking at the Venn diagram of things we want and things they want (which would also help keep AI policy from sliding into partisanship, if such a thing is still possible, given the largely right-leaning associations of the AIS community*).
For the next four years, at the very least, I am forced to lay this hope aside. That the EO contained language in service of the FATE community was, in hindsight, very bad, and probably foreseeably so, given that even moderate Republicans like to score easy points on culture war bullshit. Probably it will be revoked, because language about bias made it an easy thing for Vance to call “far left”.
“This is ok because it will just be replaced.”
Given the current state of the game board, I don’t want to be losing any turns. We’ve already lost too many turns; setbacks are unacceptable.
“What if it gets replaced by something better?”
I envy your optimism. I’m also concerned about the same dynamic playing out in reverse; what if the new EO (or piece of legislation via whatever mechanism), like the old EO, contains some language that is (to us) beside the point, but nonetheless signals partisanship, and is retributively revoked or repealed by the next administration? This is why you don’t want AIS to be partisan; partisanship is dialectics without teleology.
Ok, so structurally divisive: establishment politics has made it ~impossible to form meaningful coalitions around issues other than absolute lightning rods (e.g. abortion, immigration; the ‘levers’ available to partisan hacks looking to gin up donations). It’s not just that they make you hate your neighbors, it’s that they make you behave as though you hate your neighbors, lest your policy proposals get painted with the broad red brush and summarily dismissed.
I think this is the kind of observation that leads many experienced people interested in AIS to work on things outside of AIS, but with an eye toward implications for AI (e.g. Critch, A Ray). You just have these lucid flashes of how stacked the deck really is, and set about digging the channel that is, compared to the existing channels, marginally more robust to reactionary dynamics (‘aligning the current of history with your aims’ is maybe a good image).
Hopefully undemocratic regulatory processes serve their function as a backdoor for the sensible, but it’s unclear how penetrating the partisanship will be over the next four years (and, of course, those at the top are promising that it will be Very Penetrating).
*I am somewhat ambivalent about how right-leaning AIS really is. Right-leaning compared to middle class Americans living in major metros? Probably. Tolerant of people with pretty far-right views? Sure, to a point. Right of the American center as defined in electoral politics (e.g. ‘Republican-voting’)? Usually not.
I think the key missing piece you’re pointing at (making sure that our interpretability tools etc actually tell us something alignment-relevant) is one of the big things going on in model organisms of misalignment (iirc there’s a step that’s like ‘ok, but if we do interpretability/control/etc at the model organism does that help?’). Ideally this type of work, or something close to it, could become more common // provide ‘evals for our evals’ // expand in scope and application beyond deep deception.
If that happened, it seems like it would fit the bill here.
Does that seem true to you?
I like this post but I think redwood has varied some on whether control is for getting alignment work out of AIs vs getting generally good-for-humanity work out of them and pushing for a pause once they reach some usefulness/danger threshold (eg well before super intelligence).
[based on my recollection of Buck seminar in MATS 6]
Makes sense. Pretty sure you can remove it (and would appreciate that).
Many MATS scholars go to Anthropic (source: I work there).
Redwood I’m really not sure, but that could be right.
Sam now works at Anthropic.
Palisade: I’ve done some work for them, I love them, I don’t know that their projects so far inhibit Anthropic (BadLlama, which I’m decently confident was part of the cause for funding them, was pretty squarely targeted at Meta, and is their most impactful work to date by several OOM). In fact, the softer versions of Palisade’s proposal (highlighting misuse risk, their core mission), likely empower Anthropic as seemingly the most transparent lab re misuse risks.
I take the thrust of your comment to be “OP funds safety, do your research”. I work in safety; I know they fund safety.
I also know most safety projects differentially benefit Anthropic (this fact is independent of whether you think differentially benefiting Anthropic is good or bad).
If you can make a stronger case for any of the other of the dozens of orgs on your list than exists for the few above, I’d love to hear it. I’ve thought about most of them and don’t see it, hence why I asked the question.
Further: the goalpost is not ‘net positive with respect to TAI x-risk.’ It is ‘not plausibly a component of a meta-strategy targeting the development of TAI at Anthropic before other labs.’
Edit: use of the soldier mindset flag above is pretty uncharitable here; I am asking for counter-examples to a hypothesis I’m entertaining. This is the actual opposite of soldier mindset.
updated, thanks!
The CCRU is under-discussed in this sphere as a direct influence on the thoughts and actions of key players in AI and beyond.
Land started a creative collective, alongside Mark Fisher, in the 90s. I learned this by accident, and it seems like a corner of intellectual history that’s at least as influential as ie the extropians.
If anyone knows of explicit connections between the CCRU and contemporary phenomena (beyond Land/Fisher’s immediate influence via their later work), I’d love to hear about them.
Does anyone have examples of concrete actions taken by Open Phil that point toward their AIS plan being anything other than ‘help Anthropic win the race’?
I think a non-zero number of those disagree votes would not have appeared if the same comment were made by someone other than an Anthropic employee, based on seeing how Zac is sometimes treated IRL. My comment is aimed most directly at the people who cast those particular disagree votes.
I agree with your comment to Ryan above that those who identified “Anthropic already does most of these” as “the central part of the comment” were using the disagree button as intended.
The threshold for hitting the button will be different in different situations; I think the threshold many applied here was somewhat low, and a brief look at Zac’s comment history, to me, further suggests this.
I want to double down on this:
Zac is consistently generous with his time, even when dealing with people who are openly hostile toward him. Of all lab employees, Zac is among the most available for—and eager to engage in—dialogue. He has furnished me personally with >2 dozen hours of extremely informative conversation, even though our views differ significantly (and he has ~no instrumental reason for talking to me in particular, since I am but a humble moisture farmer). I’ve watched him do the same with countless others at various events.
I’ve also watched people yell at him more than once. He kinda shrugged, reframed the topic, and politely continued engaging with the person yelling at him. He has leagues more patience and decorum than is common among the general population. Moreover, in our quarrelsome pocket dimension, he’s part of a mere handful of people with these traits.
I understand distrust of labs (and feel it myself!), but let’s not kill the messenger, lest we run out of messengers.
This may be an example, but I don’t think it’s an especially central one, for a few reasons:
1. The linked essay discusses, quite narrowly, the act of making predictions about artificial intelligence/the Actual Future based on the contents of science fiction stories that make (more-or-less) concrete predictions on those topics, thus smuggling in a series of warrants that poison the reasoning process from that point onward. This post, by contrast, is about feelings.
2. The process for reasoning about one’s, say, existential disposition, is independent of the process for reasoning regarding the technical details of AI doom. Respective solutions-spaces for the question “How do I deal with this present-tense emotional experience?” and “How do I deal with this future-tense socio-technical possibility?” are quite different. While they may feed into each other (in the case, for instance, of someone who’s decided they must self-soothe and level out before addressing the technical problem that’s staring them down or, conversely, someone who’s decided the most effective anxiety treatment is direct material action regarding the object of anxiety), they’re otherwise quite independent. It’s useful to use a somewhat different (part of your)self to read the Star Wars Extended Universe than you would use to read, i.e., recent AI Safety papers.
3. One principle use of fiction is to open a window into aspects of experience that the reader might not otherwise access. Most directly, fiction can help to empathize with people who are very different from you, or help you come to grips with the fact that other people in fact exist at all. It can also show you things you might not otherwise see, and impart tools for seeing in new and exciting ways. I think reading The Logical Fallacy of Generalization from Fictional Evidence as totally invalidating insights from fiction is a mistake, particularly because the work itself closes with a quote from a work of fiction (which I take as pretty strong evidence the author would not endorse using the work in this way). If you don’t think your implied reading of Yudkowsky here would actually preculde deriving any insight whatsoever from fiction, I’d like to hear what insights from fiction it would permit, since it seems to me like Ray’s committing the most innocent class of this sin, were it a sin. It’s possible you just don’t think fiction is useful at all, and in that case I just wouldn’t try to convince you further.
4. I read Ray’s inclusion of the story as immaterial to his point (this essay is, not-so-secretly, about his own emotional development, with some speculation about the broader utility for others in the community undergoing similar processes). It’s common practice in personal essay writing to open with a bit of fiction, or a poem, or something else that illustrates a point before getting more into the meat of it. Ray happens to have a cute/nerdy memory from his childhood that he connects to a class of thinking that in fact has a rich tradition (or, multiple rich traditions, with parallel schools and approaches in ~every major religious lineage).
[there’s a joke here, too, and I hope you’ll read my tone generously, because I do mean it lightheartedly, about “The Logical Fallacy of Generalization from The Logical Fallacy of Generalization from Fictional Evidence”]
Sometimes people give a short description of their work. Sometimes they give a long one.
I have an imaginary friend whose work I’m excited about. I recently overheard them introduce and motivate their work to a crowd of young safety researchers, and I took notes. Here’s my best reconstruction of what he’s up to:
“I work on median-case out-with-a-whimper scenarios and automation forecasting, with special attention to the possibility of mass-disempowerment due to wealth disparity and/or centralization of labor power. I identify existing legal and technological bottlenecks to this hypothetical automation wave, including a list of relevant laws in industries likely to be affected and a suite of evals designed to detect exactly which kinds of tasks are likely to be automated and when.
“My guess is that there are economically valuable AI systems between us and AGI/TAI/ASI, and that executing on safety and alignment plans in the midst of a rapid automation wave is dizzyingly challenging. Thinking through those waves in advance feels like a natural extension of placing any weight at all on the structure of the organization that happens to develop the first Real Scary AI. If we think that the organizational structure and local incentives of a scaling lab matter, shouldn’t we also think that the societal conditions and broader class of incentives matter? Might they matter more? The state of the world just before The Thing comes on line, or as The Team that makes The Thing is considering making The Thing, has consequences for the nature of the socio-technical solutions that work in context.
“At minimum, my work aims to buy us some time and orienting-power as the stakes raise. I’d imagine my maximal impact is something like “develop automation timelines and rollout plans that you can peg AI development to, such that the state of the world and the state of the art AI technology advance in step, minimizing the collateral damage and chaos of any great economic shift.”
“When I’ve brought these concerns up to folks at labs, they’ve said that these matters get discussed internally, and that there’s at least some agreement that my direction is important, but that they can’t possibly be expected to do everything to make the world ready for their tech. I, perhaps somewhat cynically, think they’re doing narrow work here on the most economically valuable parts, but that they’re disinterested in broader coordination with public and private entities, since it would be economically disadvantageous to them.
“When I’ve brought these concerns up to folks in policy, they’ve said that some work like this is happening, but that it’s typically done in secret, to avoid amorphous negative externalities. Indeed, the more important this work is, the less likely someone is to publish it. There’s some concern that a robust and publicly available framework of this type could become a roadmap for scaling labs that helps them focus their efforts for near-term investor returns, possibly creating more fluid investment feedback loops and lowering the odds that disappointed investors back out, indirectly accelerating progress.
“Publicly available work on the topic is ~abysmal, painting the best case scenario as the most economically explosive one (most work of this type is written for investors and other powerful people), rather than pricing in the heightened x-risk embedded in this kind of destabilization. There’s actually an IMF paper modeling automation from AI systems using math from the industrial revolution. Surely there’s a better way here, and I hope to find it.”
I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don’t ‘think differently’, in a structural sense, from their forebears; they just don’t believe in that God anymore.
The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.
They’re no less tribal or dogmatic, or more critical, than the place they came from. They just vote the other way and can maybe talk about one or two levels of abstraction beyond the stereotype they identify against (although they can’t really think about those levels).
You should still be nice to them, and honest with them, but you should understand what you’re getting into.
The mere biographical detail of having a religious background or being religious isn’t a strong mark against someone’s thinking on other topics, but it is a sign you may be talking to a member of a certain meta-intellectual culture, and need to modulate your style. I have definitely had valuable conversations with people that firmly belong in this category, and would not categorically discourage engagement. Just don’t be so surprised when the usual jutsu falls flat!