PauseAI US is a separate entity from PauseAI so I believe it should also be listed.
MichaelDickens
Not OP but I think Functional Threshold Power is fine. I don’t know of any literature directly comparing it to VO2max, but much of the literature on VO2max didn’t actually measure VO2max, it used proxies like “maximum gradient at which a participant can walk for 3 minutes” (called the Balke treadmill test. When meta-analyses report that VO2max strongly predicts health outcomes, what they usually* mean is “VO2max, and also various proxies for VO2max, when thrown together into a meta-analysis, strongly predict health outcomes”. So as far as I can tell from what (little) research I’ve looked at, there are a lot of metrics that work and it’s not clear which ones work better than others. And FTP seems like as good a measure as any.
For example, have a look at Table 2 in Impact of Cardiorespiratory Fitness on All-Cause and Disease-Specific Mortality: Advances Since 2009, which gives a list of studies and what measure each study used. You can see that they used a variety of fitness metrics.
*I’ve only actually looked at two meta-analyses
I think there is some hope but I don’t really know how to do it. I think if their behavior was considered sufficiently shameful according to their ingroup then they would stop. But their ingroup specifically selects for people who think they are doing the right thing.
I have some small hope that they can be convinced by good arguments, although if that were true, surely they would’ve already been convinced by now? Perhaps they are simply not aware of the arguments for why what they’re doing is bad?
The question under discussion was: Is Anthropic “quite in sync with the AI x-risk community”? If it’s taking unilateral actions that are unpopular with the AI x-risk community, then it’s not in sync.
prosaic alignment is clearly not scalable to the types of systems they are actively planning to build
Why do you believe this?
(FWIW I think it’s foolish that all (?) frontier companies are all-in on prosaic alignment, but I am not convinced that it “clearly” won’t work.)
Just my personal opinion:
My sense is that Anthropic is somewhat more safety-focused than the other frontier AI companies, in that most of the companies only care maybe 10% as much about safety as they should, and Anthropic cares 15% as much as it should.
What numbers would you give to these labs?
My median guess is that if an average company is −100 per dollar then Anthropic is −75. I believe Anthropic is making things worse on net by pushing more competition, but an Anthropic-controlled ASI is a bit less likely to kill everyone than an ASI controlled by anyone else.
But I also have significant (< 50%) probability on Anthropic being the worst company in terms of actual consequences because its larger-but-still-insufficient focus on safety may create a false sense of security that ends up preventing good regulations from being implemented.
You may also be interested in SaferAI’s risk management ratings.
I used to think Anthropic was [...] quite in sync with the AI x-risk community.
I think Anthropic leadership respects the x-risk community in their words but not in their actions. Anthropic says safety is important, and invests a decent amount into safety research; but also opposes coordination, supports arms races, and has no objection to taking unilateral actions that are unpopular in the x-risk community (and among the general public for that matter).
Huh. I knew that’s how ChatGPT worked but I had assumed they would’ve worked out a less hacky solution by now!
I wrote several attempts at a reply and deleted them all because none of them were cruxes for me. I went for a walk and thought more deeply about my cruxes.
I am now back from my walk. Here is what I have determined:
No reply I could write would be cruxy because my original post is not cruxy with respect to my personal behavior.
I believe the correct thing for me to do is to advocate for slowing down AI development, and to donate to orgs that cost-effectively advocate for slowing down AI development. And my post is basically irrelevant to why I believe that.
So why did I write the post? When I wrote it, I wasn’t thinking about cruxes. It was just an argument I had been thinking about that I’d never read before, and I thought someone ought to write it out.
And I’m not sure exactly who this post is a crux for. Perhaps if someone had a particular combination of beliefs about
the probability that slowing down AI development will work
the probability that bootstrapped alignment will work
where they’re teetering on the edge between “slowing down AI development is good” and “slowing down AI development is bad because it prevents bootstrapped alignment from happening”. My argument might shift that person from the second position to the first. I don’t know if any such person exists.
This is most relevant to slowing down AI development at particular companies—say, if DeepMind slows down and gets significantly surpassed by Meta, then Meta will probably do something that’s even less likely to work than bootstrapped alignment. But a global coordinated slowdown—which is my preferred outcome—does not replace bootstrapped alignment with a worse alignment strategy.
Even though it’s not cruxy, I feel like I should give an object-level response to your comment:
I agree with the denotation of your comment because it is well-hedged—I agree that 5% of resources might be enough to solve alignment. But it probably won’t be.
I think my biggest concern isn’t that AI alignment has no scalable solutions (I agree with you that it probably does have them); my concern is more that alignment is likely to be too hard / get outpaced by capabilities and we will have ASI before alignment is solved.
We can in principle solve large fraction of safety/alignment with fully theoretical safety research without any compute while it seems harder to do purely theoretical capabilities research.
Not to say I disagree (my intuition is that theoretical approaches are underrated), but this contradicts AI companies’ plans (or at least Anthropic’s). Anthropic has claimed that they need to build frontier AI systems on which to do safety research. They seem to think they can’t solve alignment with theoretical approaches. More broadly, if they’re correct, then it seems to me (although it’s not a straightforward contradiction) that alignment bootstrapping won’t have significant advantages to scale because they will need increasing amounts of compute for alignment-related experiments.
FWIW I think your point is more reasonable than Anthropic’s position (I wrote some relevant stuff here). But I thought it was worthwhile to point out the contradiction.
Why would AI companies use human-level AI to do alignment research?
I would not describe it as heroic. I think it’s approximately morally equivalent to choosing an 80% chance of making all Americans immortal (but not non-Americans) and a 20% chance of killing everyone in the world.
This is not a perfect analogy because the philosophical arguments for discounting future generations are stronger than the arguments for discounting non-Americans.
(Also my P(doom) is higher than 20%, that’s just an example)
The argument people make is that LLMs improve the productivity of people’s safety research so it’s worth paying. That kinda makes sense. But I do think “don’t give money to the people doing bad things” is a strong heuristic.
I’m a pretty big believer in utilitarianism but I also think people should be more wary of consequentialist justifications for doing bad things. Eliezer talks about this in Ends Don’t Justify Means (Among Humans), he’s also written some (IMO stronger) arguments elsewhere but I don’t recall where.
Basically, if I had a nickel for every time someone made a consequentialist argument for why doing a bad thing was net positive, and then it turned out to be net negative, I’d be rich enough to diversify EA funding away from Good Ventures.
I have previously paid for LLM subscriptions (I don’t have any currently) but I think I was not giving enough consideration to the “ends don’t justify means among humans” principle, so I will not buy any subscriptions in the future.
I don’t think “not really caring” necessarily means someone is being deceptive. I hadn’t really thought through the terminology before I wrote my original post, but I would maybe define 3 categories:
claims to care about x-risk, but is being insincere
genuinely cares about x-risk, but also cares about other things (making money etc.), so they take actions that fit their non-x-risk motivations and then come up with rationalizations for why those actions are good for x-risk
genuinely cares about x-risk, and has pure motivations, but sometimes make mistakes and end up increasing x-risk
I would consider #1 and #2 to be “not really caring”. #3 really cares. But from the outside it can be hard to tell the difference between the three. (And in fact, from the inside, it’s hard to tell whether you’re a #2 or a #3.)
On a more personal note, I think in the past I was too credulous about ascribing pure motivations to people when I had disagreements with them, when in fact the reason for the disagreement was that I care about x-risk and they’re either insincere or rationalizing. My original post is something I think Michael!2018 would benefit from reading.
What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low
Not GP but I’d guess maybe 10%. Seems worth it to try. IMO what they should do is hire a team of top negotiators to work full-time on making deals with other AI companies to coordinate and slow down the race.
ETA: What I’m really trying to say is I’m concerned Anthropic (or some other company) would put in a half-assed effort to cooperate and then give up, when what they should do is Try Harder. “Hire a team to work on it full time” is one idea for what Trying Harder might look like.
What AI safety plans are there?
Your comment about 1e-6 p-doom is not right because we face many other X-risks that developing AGI would reduce.
Ah you’re right, I wasn’t thinking about that. (Well I don’t think it’s obvious that an aligned AGI would reduce other x-risks, but my guess is it probably would.)
I find it hard to trust that AI safety people really care about AI safety.
DeepMind, OpenAI, Anthropic, and SSI were all founded in the name of safety. Instead they have greatly increased danger. And at least OpenAI and Anthropic have been caught lying about their motivations:
OpenAI: claiming concern about hardware overhang and then trying to massively scale up hardware; promising compute to superalignment team and then not giving it; telling board that model passed safety testing when it hadn’t; too many more to list.
Anthropic: promising (in a mealy-mouthed technically-not-lying sort of way) not to push the frontier, and then pushing the frontier; trying (and succeeding) to weaken SB-1047; lying about their connection to EA (that’s not related to x-risk but it’s related to trustworthiness).
For whatever reason, I had the general impression that Epoch is about reducing x-risk (and I was not the only one with that impression) but:
Epoch is not about reducing x-risk, and they were explicit about this but I didn’t learn it until this week
its FrontierMath benchmark was funded by OpenAI and OpenAI allegedly has access to the benchmark (see comment on why this is bad)
some of their researchers left to start another build-AGI startup (I’m not sure how badly this reflects on Epoch as an org but at minimum it means donors were funding people who would go on to work on capabilities)
Director Jaime Sevilla believes “violent AI takeover” is not a serious concern, and “I also selfishly care about AI development happening fast enough that my parents, friends and myself could benefit from it, and I am willing to accept a certain but not unbounded amount of risk from speeding up development”, and “on net I support faster development of AI, so we can benefit earlier from it” which is a very hard position to justify (unjustified even on P(doom) = 1e-6, unless you assign ~zero value to people who are not yet born)
I feel bad picking on Epoch/Jaime because they are being unusually forthcoming about their motivations, in a way that exposes them to criticism. This is noble of them; I expect most orgs not to be this noble.
When some other org does something that looks an awful lot like it’s accelerating capabilities, and they make some argument about how it’s good for safety, I can’t help but wonder if they secretly believe the same things as Epoch and are not being forthright about their motivations
My rough guess is for every transparent org like Epoch, there are 3+ orgs that are pretending to care about x-risk but actually don’t
Whenever some new report comes out about AI capabilities, like the METR task duration projection, people talk about how “exciting” it is[1]. There is a missing mood here. I don’t know what’s going on inside the heads of x-risk people such that they see new evidence on the potentially imminent demise of humanity and they find it “exciting”. But whatever mental process results in this choice of words, I don’t trust that it will also result in them taking actions that reduce x-risk.
Many AI safety people currently or formerly worked at AI companies. They stand to make money from accelerating AI capabilities. The same is true of grantmakers
I briefly looked thru some grantmakers and I see financial COIs at Open Philanthropy, Survival and Flourishing Fund, and Manifund; but none at Long-Term Future Fund
A different sort of conflict of interest: many AI safety researchers have an ML background and enjoy doing ML. Unsurprisingly, they often arrive at the belief that doing ML research is the best way to make AI safe. This ML research often involves making AIs better at stuff. Pausing AI development (or imposing significant restrictions) would mean they don’t get to do ML research anymore. If they oppose a pause/slowdown, is that for ethical reasons, or is it because it would interfere with their careers?
At a moderate P(doom), say under 25%, from a selfish perspective it makes sense to accelerate AI if it increases the chance that you get to live forever, even if it increases your risk of dying. I have heard from some people that this is their motivation. I am appalled at the level of selfishness required to seek immortality at the cost of risking all of humanity. And I’m sure most people who hold this position know it’s appalling, so they keep it secret and publicly give rationalizations for why accelerating AI is actually the right thing to do. In a way, I admire people who are open about their selfishness, and I expect they are the minority.
But you should know that you might not be able to trust me either.
I have some moral uncertainty but my best guess is that future people are just as valuable as present-day people. You might think this leads me to put too much priority on reducing x-risk relative to helping currently-alive people. You might think it’s unethical that I’m willing to delay AGI (potentially hurting currently-alive people) to reduce x-risk.
I care a lot about non-human animals, and I believe it’s possible in principle to trade off human welfare against animal welfare. (Although if anything, I think that should make me care less about x-risk, not more.)
ETA: I am pretty pessimistic about AI companies’ plans for aligning ASI. My weakly held belief is that if companies follow their current plans, there’s a 2 in 3 chance of a catastrophic outcome. (My unconditional P(doom) is lower than that.) You might believe this makes me too pessimistic about certain kinds of strategies.
(edit: removed an inaccurate statement)
[1] ETA: I saw several examples of this on Twitter. Went back and looked and I couldn’t find the examples I recall seeing. IIRC they were mainly quote-tweets, not direct replies, and I don’t know how to find quote-tweets (the search function was unhelpful).
I’ve been doing this for about 10 years. This January I needed to get some new socks but my brand was discontinued so I decided to buy a few different brands and compare them. I will take this opportunity to write a public sock review.
CS CELERSPORT Ankle Athletic Running Socks Low Cut Sports Tab Socks (the black version of the brand you linked): I did not like the wrinkle in the back, and the texture was a bit weird. 4⁄5.
Hanes Men’s Max Cushioned Ankle Socks: Cozy and nice texture, but they made my feet too hot. I might buy these if I lived somewhere colder. 4⁄5.
Hanes Men’s Socks, X-Temp Cushioned No Show Socks: Nice texture, and not too hot. A little tight on the toes which makes it harder for me to wiggle them. These are the ones I decided to go with. 4.5/5.
Finally, none of the frameworks make provisions for putting probabilities on anything. The framework itself needn’t include the hard numbers—indeed, such numbers might best be continually updated—but a risk management framework should point to a process that outputs well-calibrated risk estimates for known and unknown pathways before and after mitigation.
AI companies should publish numerical predictions, open Metaculus questions for each of their predictions, and provide large monetary prizes for predictors.
Let’s say out of those 200 activities, (for simplicity) 199 would take humans 1 year, and one takes 100 years. If a researcher AI is only half as good as humans at some of the 199 tasks, but 100x better at the human-bottleneck task, then AI can do in 2 years what humans can do in 100.
As I understand it, PauseAI Global aka PauseAI supports protests in most regions, whereas US-based protests are run by PauseAI US which is a separate group of people.