Last year I figured (maybe motivatedly) I would probably be better off if I finished getting my startup acquired (or bankrupt) before applying to work at Anthropic as a cybersecurity engineer, because I doubted I’d get hired without having done something impressive. But if the trend continues, by the time I manage to apply there will be people making blogs on how to pass the “Anthropic Interview” by pretending to be an EA and I’ll be in an even worse position than before.
I’m quite surprised to hear that you’d be trying to work at Anthropic lc. Anthropic seems to me quite like they’re primarily trying to do cutting edge AGI development work. Much of the work that they would call alignment seems to me hard to separate from capabilities work — for example, all the scaling law stuff is upstream of most of the AI hype of late, and they’re getting investment as a for-profit and scaling up funding and models and hiring massively. A small alignment research non-profit like Redwood Research seems much less ethically suspect on these dimensions.
I wouldn’t want to work for Anthropic in any position where I thought I might someday be pressured to do capabilities work, or “alignment research” that had a significant chance of turning out to be capabilities work. If your impression is that there’s a good chance of that happening, or there’s some other legitimization type effect I’m not considering, then I’ll save myself the trouble of applying.
Tentatively, the type of cybersecurity engineering I want to do for Anthropic (pending some due diligence) seems unlikely to sprawl, and I might have some small chances of significantly extending the critical period and slightly lowering timelines. What I’m hoping is that there’s a way to help push them towards security adequacy on this scale, which I couldn’t do effectively outside of the company.
Redwood Research is also on my list of places to send resumes to.
I wouldn’t want to work for Anthropic in any position where I thought I might someday be pressured to do capabilities work, or “alignment research” that had a significant chance of turning out to be capabilities work. If your impression is that there’s a good chance of that happening, or there’s some other legitimization type effect I’m not considering, then I’ll save myself the trouble of applying.
One piece of data: I haven’t been working at Anthropic for very long so far, but I have easily been able to avoid and haven’t personally felt pressured to do any capability-relevant stuff. In terms of other big labs, my guess is that would also be true at DeepMind, but would not be true at OpenAI.
My current model is that Anthropic has illegitimately taken up the flag of “big ML place that the people who want to avoid race dynamics go to” while basically doing most of the relevant work to be competitive, continuing to do tons of things that enhance race dynamics (e.g. being for-profit, aggressive fundraising rounds, massive staff size of just good engineers i.e. not the sort of x-risk-conscious people who would gladly stop their work if the leadership thought it was getting too close to AGI, developing cutting edge language models, etc), and the leadership basically having pretty poor models of alignment (e.g. I don’t know that any Anthropic staff are doing any work to implement experiments on even Paul’s ideas; I have also heard people who know them saying that they expect engineering leadership would not feel very concerned if their systems showed signs of deception [EDIT: see bottom of comment for update]), and broadly having little of the reasons to trust their judgment on these questions in the way I trust a few other people like Eliezer or Paul or John or Gwern — given that Dario/Daniela have written exceedingly little of their models and plans publicly, and from my occasional conversations with insiders literally don’t have plans other that don’t sound to me like “get as close to AGI as possible and then pause and do research with those AIs, probably this will work out fine”.
Might be wrong of course, but no way really to know, given the aforementioned lack of detailed explanations (or any really) of their views on the alignment problem and how to solve it.
That said, cybersecurity does sound interesting and potentially something that will extend timelines if it prevents foreign actors from stealing data, so could still be worth it (though Anthropic just got funding from Google, and if that relationship sustains, I presume that they will have access to Google’s cyberecurity folks? I honestly don’t know how competent they will be — whether they’re nailing it or will have given up on state-actor-level security). Also attempting to influence the leadership positively sounds potentially worthwhile.
Added: I recently heard a third-hand story of someone working there on scaling-up who said “I just think Dario’s plan is good”. When asked what Dario’s plan was, they said “Well, I think what I mean is that I just trust him to do the right thing when the time comes.” If this is accurate, then from my perspective, this is worse than Dario having a bad plan, because you can at least give a counter-argument to a bad plan. Someone with no plan doesn’t even have something to discuss. If you don’t have a plan, you can’t notice when it doesn’t match up with reality.
Edit: I PM’d with Evan, after his comment below, and it now seems to me that in the last 1.5 years Anthropic leadership has probably updated toward being more concerned about deceptive models.
massive staff size of just good engineers i.e. not the sort of x-risk-conscious people who would gladly stop their work if the leadership thought it was getting too close to AGI
From my interactions with engineers at Anthropic so far, I think is a mischaracterization. I think the vast majority are in fact pretty x-risk conscious and my guess is that in fact if leadership said stop people would be happy to stop.
engineering leadership would not feel very concerned if their systems showed signs of deception
I’ve had personal conversations with Anthropic leadership about this and can confirm that it is definitely false; I think Jared and Dario would be quite concerned if they saw deceptive alignment in their models.
That’s good to hear you think that! I’d find it quite helpful to know the results of a survey to the former effect, of the (40? 80?) ML engineers and researchers there, anonymously answering a question like “Insofar as your job involves building large language models, if Dario asked you to stop your work for 2 years while still being paid your salary, how likely would you be to do so (assume the alternative is being fired)? (1-10, Extremely Unlikely, Extremely Likely)” and the same question but “Condition on it looking to you like Anthropic and OpenAI are both 1-3 years from building AGI”. I’d find that evidence quite informative. Hat tip to Habryka for suggesting roughly this question to me a year ago.
(I’m available and willing to iterate on a simple survey to that effect if you are too, and can do some iteration/user-testing with other people.)
(I’ll note that if the org doubles in size every year or two then… well, I don’t know how many x-risk conscious engineers you’ll get, or what sort of enculturation Anthropic will do in order to keep the answer to this up at 90%+.)
Regarding the latter, I’ve DM’d you about the specifics.
Consider writing up some of these impressions publicly. I would have talked to a couple people at the org before joining, but as someone who is almost completely disconnected from the “rationalist scene” physically, all I have to go on are what people say about the org on the internet. I don’t really have access to the second- or third-hand accounts that you probably have.
The signals I can remember updating against over the past were something like:
A few offhand tweets, before I stopped reading twitter, by EY, explaining that he was more-impressed-than-expected with their alignment research.
Some comments on LW I can’t remember alleging that Anthropic was started via an exodus of OpenAI’s most safety-conscious researchers.
Their website and general policy of not publishing capabilities research by default.
The identities and EA-affiliations of the funders. Jaan Tallinnn seems like a nice person. SBF was not a good person, but his involvement lets me infer certain things about their pitch/strategy that I can’t otherwise.
This post by evhub and just the fact that they have former MIRI researchers joining the team at all. Didn’t even remember this part until he commented.
In retrospect maybe these are some pretty silly things to base an opinion of an AGI organization on. But I guess you could say their marketing campaign was successful, and my cautious opinion was that they were pretty sincere and effective.
There’s an effect that works in the opposite direction where you lower the hiring bar as headcount scales. Key early hires may have a more stringent filter applied to them than later additions. But the bar can still be arbitrarily high, look at the profiles of people who are joining recently, e.g Leaving Wave, joining Anthropic | benkuhn.net
It’s important to be clear about what the goal is: if it’s the instrumental careerist goal “increase status to maximize the probability of joining a prestigious organization”, then that strategy may look very different from the terminal scientist goal of “reduce x-risk by doing technical AGI alignment work”. The former seems much more competitive than the latter.
The following part will sound a little self-helpy, but hopefully it’ll be useful:
Concrete suggestion: this weekend, execute on some small tasks which satisfy the following constraints:
can’t be sold as being important or high impact.
won’t make it into the top 10 list of most impressive things you’ve ever done.
not necessarily aligned with your personal brand.
has relatively low value from an optics perspective.
high confidence of trivially low implementation complexity.
can be abandoned at zero reputational/relationship cost.
isn’t connected to a broader roadmap and high-level strategy.
requires minimal learning/overcoming insignificant levels of friction.
doesn’t feel intimidating or serious or psychologically uncomfortable.
Find the tasks in your notes after a period of physical exertion. Avoid searching the internet or digging deeply into your mind (anything you can characterize as paying constant attention to filtered noise to mitigate the risk that some decision relevant information managed to slip past your cognitive systems). Decline anything that spurs an instinct of anxious perfectionism. Understand where you are first and marginally shift towards your desired position.
You sound like someone who has a far larger max step size than ordinary people. You have the ability to get to places by making one big leap. But go to this simulation Why Momentum Really Works (distill.pub) and fix momentum at 0.99. What happens to the solution as you gradually move the step size slider to the right?
Chaotic divergence and oscillation.
Selling your startup to get into Anthropic seems, with all due respect, to be a plan with step count = 1. Recall Expecting Short Inferential Distances. Practicing adaptive dampening would let you more reliably plan and follow routes requiring step count > 1. To be fair, I can kinda see where you’re coming from, and logically it can be broken down into independent subcomponents that you work on in parallel, but the best advice I can concisely offer without more context on the details of your situation would be this:
It’s important to be clear about what the goal is: if it’s the instrumental careerist goal “increase status to maximize the probability of joining a prestigious organization”, then that strategy may look very different from the terminal scientist goal of “reduce x-risk by doing technical AGI alignment work”. The former seems much more competitive than the latter.
I have multiple goals. My major abstract long term root wants are probably something like (in no particular order):
help the world, reduce existential risk, do the altruism thing
be liked by my ingroup (rationalists, EAs)
have outgroup prestige (for my parents, strangers, etc.)
have some close friends/a nice bf/gf
keep most of my moral integrity/maintain my identity a slightly edgy person
Finishing my startup before trying to work somewhere like lightcone or RR or (portions of) Anthropic feels like a pareto optimal frontier on those things, though I’m open to arguments that they’re not, and appreciated your comment.
Last year I figured (maybe motivatedly) I would probably be better off if I finished getting my startup acquired (or bankrupt) before applying to work at Anthropic as a cybersecurity engineer, because I doubted I’d get hired without having done something impressive. But if the trend continues, by the time I manage to apply there will be people making blogs on how to pass the “Anthropic Interview” by pretending to be an EA and I’ll be in an even worse position than before.
I’m quite surprised to hear that you’d be trying to work at Anthropic lc. Anthropic seems to me quite like they’re primarily trying to do cutting edge AGI development work. Much of the work that they would call alignment seems to me hard to separate from capabilities work — for example, all the scaling law stuff is upstream of most of the AI hype of late, and they’re getting investment as a for-profit and scaling up funding and models and hiring massively. A small alignment research non-profit like Redwood Research seems much less ethically suspect on these dimensions.
I wouldn’t want to work for Anthropic in any position where I thought I might someday be pressured to do capabilities work, or “alignment research” that had a significant chance of turning out to be capabilities work. If your impression is that there’s a good chance of that happening, or there’s some other legitimization type effect I’m not considering, then I’ll save myself the trouble of applying.
Tentatively, the type of cybersecurity engineering I want to do for Anthropic (pending some due diligence) seems unlikely to sprawl, and I might have some small chances of significantly extending the critical period and slightly lowering timelines. What I’m hoping is that there’s a way to help push them towards security adequacy on this scale, which I couldn’t do effectively outside of the company.
Redwood Research is also on my list of places to send resumes to.
One piece of data: I haven’t been working at Anthropic for very long so far, but I have easily been able to avoid and haven’t personally felt pressured to do any capability-relevant stuff. In terms of other big labs, my guess is that would also be true at DeepMind, but would not be true at OpenAI.
My current model is that Anthropic has illegitimately taken up the flag of “big ML place that the people who want to avoid race dynamics go to” while basically doing most of the relevant work to be competitive, continuing to do tons of things that enhance race dynamics (e.g. being for-profit, aggressive fundraising rounds, massive staff size of just good engineers i.e. not the sort of x-risk-conscious people who would gladly stop their work if the leadership thought it was getting too close to AGI, developing cutting edge language models, etc), and the leadership basically having pretty poor models of alignment (e.g. I don’t know that any Anthropic staff are doing any work to implement experiments on even Paul’s ideas; I have also heard people who know them saying that they expect engineering leadership would not feel very concerned if their systems showed signs of deception [EDIT: see bottom of comment for update]), and broadly having little of the reasons to trust their judgment on these questions in the way I trust a few other people like Eliezer or Paul or John or Gwern — given that Dario/Daniela have written exceedingly little of their models and plans publicly, and from my occasional conversations with insiders literally don’t have plans other that don’t sound to me like “get as close to AGI as possible and then pause and do research with those AIs, probably this will work out fine”.
Might be wrong of course, but no way really to know, given the aforementioned lack of detailed explanations (or any really) of their views on the alignment problem and how to solve it.
That said, cybersecurity does sound interesting and potentially something that will extend timelines if it prevents foreign actors from stealing data, so could still be worth it (though Anthropic just got funding from Google, and if that relationship sustains, I presume that they will have access to Google’s cyberecurity folks? I honestly don’t know how competent they will be — whether they’re nailing it or will have given up on state-actor-level security). Also attempting to influence the leadership positively sounds potentially worthwhile.
Added: I recently heard a third-hand story of someone working there on scaling-up who said “I just think Dario’s plan is good”. When asked what Dario’s plan was, they said “Well, I think what I mean is that I just trust him to do the right thing when the time comes.” If this is accurate, then from my perspective, this is worse than Dario having a bad plan, because you can at least give a counter-argument to a bad plan. Someone with no plan doesn’t even have something to discuss. If you don’t have a plan, you can’t notice when it doesn’t match up with reality.
Edit: I PM’d with Evan, after his comment below, and it now seems to me that in the last 1.5 years Anthropic leadership has probably updated toward being more concerned about deceptive models.
From my interactions with engineers at Anthropic so far, I think is a mischaracterization. I think the vast majority are in fact pretty x-risk conscious and my guess is that in fact if leadership said stop people would be happy to stop.
I’ve had personal conversations with Anthropic leadership about this and can confirm that it is definitely false; I think Jared and Dario would be quite concerned if they saw deceptive alignment in their models.
That’s good to hear you think that! I’d find it quite helpful to know the results of a survey to the former effect, of the (40? 80?) ML engineers and researchers there, anonymously answering a question like “Insofar as your job involves building large language models, if Dario asked you to stop your work for 2 years while still being paid your salary, how likely would you be to do so (assume the alternative is being fired)? (1-10, Extremely Unlikely, Extremely Likely)” and the same question but “Condition on it looking to you like Anthropic and OpenAI are both 1-3 years from building AGI”. I’d find that evidence quite informative. Hat tip to Habryka for suggesting roughly this question to me a year ago.
(I’m available and willing to iterate on a simple survey to that effect if you are too, and can do some iteration/user-testing with other people.)
(I’ll note that if the org doubles in size every year or two then… well, I don’t know how many x-risk conscious engineers you’ll get, or what sort of enculturation Anthropic will do in order to keep the answer to this up at 90%+.)
Regarding the latter, I’ve DM’d you about the specifics.
Consider writing up some of these impressions publicly. I would have talked to a couple people at the org before joining, but as someone who is almost completely disconnected from the “rationalist scene” physically, all I have to go on are what people say about the org on the internet. I don’t really have access to the second- or third-hand accounts that you probably have.
The signals I can remember updating against over the past were something like:
A few offhand tweets, before I stopped reading twitter, by EY, explaining that he was more-impressed-than-expected with their alignment research.
Some comments on LW I can’t remember alleging that Anthropic was started via an exodus of OpenAI’s most safety-conscious researchers.
Their website and general policy of not publishing capabilities research by default.
The identities and EA-affiliations of the funders. Jaan Tallinnn seems like a nice person. SBF was not a good person, but his involvement lets me infer certain things about their pitch/strategy that I can’t otherwise.
This post by evhub and just the fact that they have former MIRI researchers joining the team at all. Didn’t even remember this part until he commented.
In retrospect maybe these are some pretty silly things to base an opinion of an AGI organization on. But I guess you could say their marketing campaign was successful, and my cautious opinion was that they were pretty sincere and effective.
There’s an effect that works in the opposite direction where you lower the hiring bar as headcount scales. Key early hires may have a more stringent filter applied to them than later additions. But the bar can still be arbitrarily high, look at the profiles of people who are joining recently, e.g Leaving Wave, joining Anthropic | benkuhn.net
It’s important to be clear about what the goal is: if it’s the instrumental careerist goal “increase status to maximize the probability of joining a prestigious organization”, then that strategy may look very different from the terminal scientist goal of “reduce x-risk by doing technical AGI alignment work”. The former seems much more competitive than the latter.
The following part will sound a little self-helpy, but hopefully it’ll be useful:
Concrete suggestion: this weekend, execute on some small tasks which satisfy the following constraints:
can’t be sold as being important or high impact.
won’t make it into the top 10 list of most impressive things you’ve ever done.
not necessarily aligned with your personal brand.
has relatively low value from an optics perspective.
high confidence of trivially low implementation complexity.
can be abandoned at zero reputational/relationship cost.
isn’t connected to a broader roadmap and high-level strategy.
requires minimal learning/overcoming insignificant levels of friction.
doesn’t feel intimidating or serious or psychologically uncomfortable.
Find the tasks in your notes after a period of physical exertion. Avoid searching the internet or digging deeply into your mind (anything you can characterize as paying constant attention to filtered noise to mitigate the risk that some decision relevant information managed to slip past your cognitive systems). Decline anything that spurs an instinct of anxious perfectionism. Understand where you are first and marginally shift towards your desired position.
You sound like someone who has a far larger max step size than ordinary people. You have the ability to get to places by making one big leap. But go to this simulation Why Momentum Really Works (distill.pub) and fix momentum at 0.99. What happens to the solution as you gradually move the step size slider to the right?
Chaotic divergence and oscillation.
Selling your startup to get into Anthropic seems, with all due respect, to be a plan with step count = 1. Recall Expecting Short Inferential Distances. Practicing adaptive dampening would let you more reliably plan and follow routes requiring step count > 1. To be fair, I can kinda see where you’re coming from, and logically it can be broken down into independent subcomponents that you work on in parallel, but the best advice I can concisely offer without more context on the details of your situation would be this:
“Learn to walk”.
I have multiple goals. My major abstract long term root wants are probably something like (in no particular order):
help the world, reduce existential risk, do the altruism thing
be liked by my ingroup (rationalists, EAs)
have outgroup prestige (for my parents, strangers, etc.)
have some close friends/a nice bf/gf
keep most of my moral integrity/maintain my identity a slightly edgy person
Finishing my startup before trying to work somewhere like lightcone or RR or (portions of) Anthropic feels like a pareto optimal frontier on those things, though I’m open to arguments that they’re not, and appreciated your comment.