yams
Anthropic should take a humanist/cosmopolitan stance on risks from AGI in which risks related to different people having different values are very clearly deprioritized compared to risks related to complete human disempowerment or extinction, as worry about the former seems likely to cause much of the latter
Can you say more about the section I’ve bolded or link me to a canonical text on this tradeoff?
[was a manager at MATS until recently and want to flesh out the thing Buck said a bit more]
It’s common for researchers to switch subfields, and extremely common for MATS scholars to get work doing something different from what they did at MATS. (Kosoy has had scholars go on to ARC, Neel scholars have ended up in scalable oversight, Evan’s scholars have a massive spread in their trajectories; there are many more examples but it’s 3 AM.)
Also I wouldn’t advise applying to something that seems interesting; I’d advise applying for literally everything (unless you know for sure you don’t want to work with Neel, since his app is very time intensive). The acceptance rate is ~4 percent, so better to maximize your odds (again, for most scholars, the bulk of the value is not in their specific research output over the 10 week period, but in having the experience at all).
Also please see Ryan’s replies to Tsvi on the talent needs report for more notes on the street lighting concern as it pertains to MATS. There’s a pretty big back and forth there (I don’t cleanly agree with one side or the other, but it might be useful to you).
Your version of events requires a change of heart (for ‘them to get a whole lot more serious’). I’m just looking at the default outcome. Whether alignment is hard or easy (although not if it’s totally trivial), it appears to be progressing substantially more slowly than capabilities (and the parts of it that are advancing are the most capabilities-synergizing, so it’s unclear what the oft-lauded ‘differential advancement of safety’ really looks like).
By bad I mean dishonest, and by ‘we’ I mean the speaker (in this case, MIRI).
I take myself to have two central claims across this thread:
Your initial comment was straw manning the ‘if we build [ASI], we all die’ position.
MIRI is likely not a natural fit to consign itself to service as the neutral mouthpiece of scientific consensus.
I do not see where your most recent comment has any surface area with either of these claims.
I do want to offer some reassurance, though:
I do not take “One guy who’s thought about this for a long time and some other people he recruited think it’s definitely going to fail” to be descriptive of the MIRI comms strategy.
Oh, I feel fine about saying ‘draft artifacts currently under production by the comms team ever cite someone who is not Eliezer, including experts with a lower p(doom)’ which, based on this comment, is what I take to be the goalpost. This is just regular coalition signaling though and not positioning yourself as, terminally, a neutral observer of consensus.
“You haven’t really disagreed that [claiming to speak for scientific consensus] would be more effective.”
That’s right! I’m really not sure about this. My experience has been that ~every take someone offers to normies in policy is preceded by ‘the science says…’, so maybe the market is kind of saturated here. I’d also worry that precommitting to only argue in line with the consensus might bind you to act against your beliefs (and I think EY et al have valuable inside-view takes that shouldn’t be stymied by the trends of an increasingly-confused and poisonous discourse). That something is a local credibility win (I’m not sure if it is, actually) doesn’t mean it’s got the best nth order effects among all options long-term (including on the dimension of credibility).
I believe that Seth would find messaging that did this more credible. I think ‘we’re really not sure’ is a bad strategy if you really are sure, which MIRI leadership, famously, is.
I do mean ASI, not AGI. I know Pope + Belrose also mean to include ASI in their analysis, but it’s still helpful to me if we just use ASI here, so I’m not constantly wondering if you’ve switched to thinking about AGI.
Obligatory ‘no really, I am not speaking for MIRI here.’
My impression is that MIRI is not trying to speak for anyone else. Representing the complete scientific consensus is an undue burden to place on an org that has not made that claim about itself. MIRI represents MIRI, and is one component voice of the ‘broad view guiding public policy’, not its totality. No one person or org is in the chair with the lever; we’re all just shouting what we think in directions we expect the diffuse network of decision-makers to be sitting in, with more or less success. It’s true that ‘claiming to represent the consensus’ is a tacking one can take to appear authoritative, and not (always) a dishonest move. To my knowledge, this is not MIRI’s strategy. This is the strategy of, ie, the CAIS letter (although not of CAIS as a whole!), and occasionally AIS orgs cite expert consensus or specific, otherwise-disagreeing experts as having directional agreement with the org (for an extreme case, see Yann LeCun shortening his timelines). This is not the same as attempting to draw authority from the impression that one’s entire aim is simply ‘sharing consensus.’
And then my model of Seth says ‘Well we should have an org whose entire strategy is gathering and sharing expert consensus, and I’m disappointed that this isn’t MIRI, because this is a better strategy,’ or else cites a bunch of recent instances of MIRI claiming to represent scientific consensus (afaik these don’t exist, but it would be nice to know if they do). It is fair for you to think MIRI should be doing a different thing. Imo MIRI’s history points away from it being a good fit to take representing scientific consensus as its primary charge (and this is, afaict, part of why AI Impacts was a separate project).
I think MIRI comms are by and large well sign-posted to indicate ‘MIRI thinks x’ or ‘Mitch thinks y’ or ‘Bengio said z.’ If you think a single org should build influence and advocate for a consensus view then help found one, or encourage someone else to do so. This just isn’t what MIRI is doing.
Good point—what I said isn’t true in the case of alignment by default.
Edited my initial comment to reflect this
(I work at MIRI but views are my own)
I don’t think ‘if we build it we all die’ requires that alignment be hard [edit: although it is incompatible with alignment by default]. It just requires that our default trajectory involves building ASI before solving alignment (and, looking at our present-day resource allocation, this seems very likely to be the world we are in, conditional on building ASI at all).
[I want to note that I’m being very intentional when I say “ASI” and “solving alignment” and not “AGI” and “improving the safety situation”]
What text analogizing LLMs to human brains have you found most compelling?
Does it seem likely to you that, conditional on ‘slow bumpy period soon’, a lot of the funding we see at frontier labs dries up (so there’s kind of a double slowdown effect of ‘the science got hard, and also now we don’t have nearly the money we had to push global infrastructure and attract top talent’), or do you expect that frontier labs will stay well funded (either by leveraging low hanging fruit in mundane utility, or because some subset of their funders are true believers, or a secret third thing)?
Only the first few sections of the comment were directed at you; the last bit was a broader point re other commenters in the thread, the fooming shoggoths, and various in-person conversations I’ve had with people in the bay.
That rationalists and EAs tend toward aesthetic bankruptcy is one of my chronic bones to pick, because I do think it indicates the presence of some bias that doesn’t exist in the general population, which results in various blind spots.
Sorry for not signposting and/or limiting myself to a direct reply; that was definitely confusing.
I think you should give 1 or 2 a try, and would volunteer my time (although if you’d find a betting structure more enticing, we could say my time is free iff I turn out to be wrong, and otherwise you’d pay me).
If this is representative of the kind of music you like, I think you’re wildly overestimating how difficult it is to make that music.
The hard parts are basically infrastructural (knowing how to record a sound, how to make different sounds play well together in a virtual space). Suno is actually pretty bad at that, though, so if you give yourself the affordance to be bad at it, too, then you can just ignore the most time-intensive part of music making.
Pasting things together (as you did here), is largely The Way Music Is Made in the digital age, anyway.
I think, in ~one hour you could:
Learn to play the melody of this song on piano.
Learn to use some randomizer tools within a DAW (some of which may be ML-based), and learn the fundamentals of that DAW, as well as just enough music theory to get by (nothing happening in the above Suno piece would take more than 10 minutes of explanation to understand).
The actual arrangement of the Suno piece is somewhat ambitious (not in that it does anything hard, just in that it has many sections), but this was the part you had to hack together yourself anyway, and getting those features in a human-made song is more about spending the time to do it, than it is about having the skill (there is a skill to doing an awesome job of it, but Suno doesn’t have that skill either).
Suno’s outputs are detectably bad to me and all of my music friends, even the e/acc or ai-indifferent ones, and it’s a significant negative update for me on the broader perceptual capacities of our community that so many folks here prefer Suno to music made by humans.
A great many tools like this already exist and are contracted by the major labels.
When you post a song to streaming services, it’s checked against the entire major label catalog before actually listing on the service (the technical process is almost certainly not literally this, but it’s something like this, and they’re very secretive about what’s actually happening under the hood).
Cool! I think we’re in agreement at a high level. Thanks for taking the extra time to make sure you were understood.
In more detail, though:
I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think ‘if we pause it will be for stupid reasons’ is a very sad take.
I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on development is ~make-work that goes away when you get a drop-in replacement for remote workers, and also yes, AI coding is already an accelerant // effectively doing gradient descent on gradient descent (RLing the RL’d researcher to RL the RL...) is intelligence-explosion fuel. But I think there’s a big gap between the capabilities you need for politically worrisome levels of unemployment, and the capabilities you need for an intelligence explosion, principally because >30 percent of human labor in developed nations could be automated with current tech if the economics align a bit (hiring 200+k/year ML engineers to replace your 30k/year call center employee is only just now starting to make sense economically). I think this has been true of current tech since ~GPT-4, and that we haven’t seen a concomitant massive acceleration in capabilities on the frontier (things are continuing to move fast, and the proliferation is scary, but it’s not an explosion).
I take “depending on how concentrated AI R&D is” to foreshadow that you’d reply to the above with something like: “This is about lab priorities; the labs with the most impressive models are the labs focusing the most on frontier model development, and they’re unlikely to set their sights on comprehensive automation of shit jobs when they can instead double-down on frontier models and put some RL in the RL to RL the RL that’s been RL’d by the...”
I think that’s right about lab priorities. However, I expect the automation wave to mostly come from middle-men, consultancies, what have you, who take all of the leftover ML researchers not eaten up by the labs and go around automating things away individually (yes, maybe the frontier moves too fast for this to be right, because the labs just end up with a drop-in remote worker ‘for free’ as long as they keep advancing down the tech tree, but I don’t quite think this is true, because human jobs are human-shaped, and buyers are going to want pretty rigorous role-specific guarantees from whoever’s selling this service, even if they’re basically unnecessary, and the one-size-fits-all solution is going to have fewer buyers than the thing marketed as ‘bespoke’).
In general, I don’t like collapsing the various checkpoints between here and superintelligence; there are all these intermediate states, and their exact features matter a lot, and we really don’t know what we’re going to get. ‘By the time we’ll have x, we’ll certainly have y’ is not a form of prediction that anyone has a particularly good track record making.
So for this argument to be worth bringing up in some general context where a pause is discussed, the person arguing it should probably believe:
We are far and away most likely to get a pause only as a response to unemployment.
An AI that precipitates pause-inducing levels of unemployment is inches from automating AI R+D.
The period between implementing the pause and massive algorithmic advancements is long enough that we’re able to increase compute stock...
....but short enough that we’re not able to make meaningful safety progress before algorithmic advancements make the pause ineffective (because, i.e., we regulated FLOPS and it just now takes 100x fewer FLOPS to build the dangerous thing).
I think the conjunct probability of all these things is low, and I think their likelihood is sensitive to the terms of the pause agreement itself. I agree that the design of a pause should consider a broad range of possibilities, and try to maximize its own odds of attaining its ends (Keep Everyone Alive).
I’m also not sure how this goes better in the no-pause world? Unless this person also has really high odds on multipolar going well and expects some Savior AI trained and aligned in the same length of time as the effective window of the theoretical pause to intervene? But that’s a rare position among people who care about safety ~at all; it’s kind of a George Hotz take or something...
(I don’t think we disagree; you did flag that this as ”...somewhat relevant in worlds where...” which is often code for “I really don’t expect this to happen, but Someone Somewhere should hold this possibility in mind.” Just want to make sure I’m actually following!)
I’ve just read this post and the comments. Thank you for writing that; some elements of the decomposition feel really good, and I don’t know that they’ve been done elsewhere.
I think discourse around this is somewhat confused, because you actually have to do some calculation on the margin, and need a concrete proposal to do that with any confidence.
The straw-Pause rhetoric is something like “Just stop until safety catches up!” The overhang argument is usually deployed (as it is in those comments) to the effect of ‘there is no stopping.’ And yeah, in this calculation, there are in fact marginal negative externalities to the implementation of some subset of actions one might call a pause. The straw-Pause advocate really doesn’t want to look at that, because it’s messy to entertain counter-evidence to your position, especially if you don’t have a concrete enough proposal on the table to assign weights in the right places.
Because it’s so successful against straw-Pausers, the anti-pause people bring in the overhang argument like an absolute knockdown, when it’s actually just a footnote to double check the numbers and make sure your pause proposal avoids slipping into some arcane failure mode that ‘arms’ overhang scenarios. That it’s received as a knockdown is reinforced by the gearsiness of actually having numbers (and most of these conversations about pauses are happening in the abstract, in the absence of, i.e., draft policy).
But… just because your interlocutor doesn’t have the numbers at hand, doesn’t mean you can’t have a real conversation about the situations in which compute overhang takes on sufficient weight to upend the viability of a given pause proposal.
You said all of this much more elegantly here:
Arguments that overhangs are so bad that they outweigh the effects of pausing or slowing down are basically arguing that a second-order effect is more salient than the first-order effect. This is sometimes true, but before you’ve screened this consideration off by examining the object-level, I think your prior should be against.
...which feels to me like the most important part. The burden is on folks introducing an argument from overhang risk to prove its relevance within a specific conversation, rather than just introducing the adversely-gearsy concept to justify safety-coded accelerationism and/or profiteering. Everyone’s prior should be against actions Waluigi-ing, by default (while remaining alert to the possibility!).
I think it would be very helpful to me if you broke that sentence up a bit more. I took a stab at it but didn’t get very far.
Sorry for my failure to parse!
I want to say yes, but I think this might be somewhat more narrow than I mean. It might be helpful if you could list a few other ways one might read my message, that seem similarly-plausible to this one.
Folks using compute overhang to 4D chess their way into supporting actions that differentially benefit capabilities.
I’m often tempted to comment this in various threads, but it feels like a rabbit hole, it’s not an easy one to convince someone of (because it’s an argument they’ve accepted for years), and I’ve had relatively little success talking about this with people in person (there’s some change I should make in how I’m talking about it, I think).
More broadly, I’ve started using quick takes to catalog random thoughts, because sometimes when I’m meeting someone for the first time, they have heard of me, and are mistaken about my beliefs, but would like to argue against their straw version. Having a public record I can point to of things I’ve thought feels useful for combatting this.
Thanks for the clarification — this is in fact very different from what I thought you were saying, which was something more like “FATE-esque concerns fundamentally increase x-risk in ways that aren’t just about (1) resource tradeoffs or (2) side-effects of poorly considered implementation details.”