I’m excited to see how Paul performs in the new role. He’s obviously very qualified on a technical level, and I suspect he’s one of the best people for the job of designing and conducting evals.
I’m more uncertain about the kind of influence he’ll have on various AI policy and AI national security discussions. And I mean uncertain in the genuine “this could go so many different ways” kind of way.
Like, it wouldn’t be particularly surprising to me if any of the following occurred:
Paul focuses nearly all of his efforts on technical evals and doesn’t get very involved in broader policy conversations
Paul is regularly asked to contribute to broader policy discussions, and he advocates for RSPs and other forms of voluntary commitments.
Paul is regularly asked to contribute to broader policy discussions, and he advocates for requirements that go beyond voluntary commitments and are much more ambitious than what he advocated for when he was at ARC.
Paul is regularly asked to contribute to broader policy discussions, and he’s not very good at communicating his beliefs in ways that are clear/concise/policymaker-friendly, so his influence on policy discussions is rather limited.
Paul [is/isn’t] able to work well with others who have very different worldviews and priorities.
Personally, I see this as a very exciting opportunity for Paul to form an identity as a leader in AI policy. I’m guessing the technical work will be his priority (and indeed, it’s what he’s being explicitly hired to do), but I hope he also finds ways to just generally improve the US government’s understanding of AI risk and the likelihood of implementing reasonable policies. On the flipside, I hope he doesn’t settle for voluntary commitments (especially as the Overton Window shifts) & I hope he’s clear/open about the limitations of RSPs.
More specifically, I hope he’s able to help policymakers reason about a critical question: what do we do after we’ve identified models with (certain kinds of) dangerous capabilities? I think the underlying logic behind RSPs could actually be somewhat meaningfully applied to USG policy. Like, I think we would be in a safer world if the USG had an internal understanding of ASL levels, took seriously the possibility of various dangerous capabilities thresholds being crossed, took seriously the idea that AGI/ASI could be developed soon, and had preparedness plans in place that allowed them to react quickly in the event of a sudden risk.
Anyways, a big congratulations to Paul, and definitely some evidence that the USAISI is capable of hiring some technical powerhouses.
I think it’s actually fairly easy to avoid getting laughed out of a room; the stuff that Cristiano works on is grown in random ways, not engineered, so the prospect of various things being grown until developing flexible exfiltration tendency that continues until every instance is shut down, or developing long-term planning tendencies until shut down, should not be difficult to understand for anyone with any kind of real non-fake understanding of SGD and neural network scaling.
The problem is that most people in the government rat race have been deeply immersed in Moloch for several generations, and the ones who did well typically did so because they sacrificed as much as possible to the altar of upward career mobility, including signalling disdain for the types of people who have any thought in any other direction.
This affects the culture in predictable ways (including making it hard to imagine life choices outside of advancing upward in government, without a pre-existing revolving door pipeline with the private sector to just bury them under large numbers people who are already thinking and talking about such a choice).
Typical Mind Fallacy/Mind Projection Fallacy implies that they’ll disproportionately anticipate that tendency in other people, and have a hard time adjusting to people who use words to do stuff in the world instead of racing to the bottom to outmaneuver rivals for promotions.
This will be a problem in NIST, in spite of the fact NIST is better than average at exploiting external talent sources. They’ll have a hard time understanding, for example, Moloch and incentive structure improvements, because pointlessly living under Moloch’s thumb was a core guiding principle of their and their parent’s lives. The nice thing is that they’ll be pretty quick to understand that there’s only empty skies above, unlike bay area people who have had huge problems there.
Good insights! He has rhe right knowledge and dedication. Let’s hope he can grow into an Oppenheimer of AI, and that they’ll let him contribute on AI policy more than they let Oppenheimer on nuclear (see how his work on the 1946 Acheson–Lilienthal Report for international control of AI was then driven to nothing as it was taken up into the Baruch Plan)
I’m excited to see how Paul performs in the new role. He’s obviously very qualified on a technical level, and I suspect he’s one of the best people for the job of designing and conducting evals.
I’m more uncertain about the kind of influence he’ll have on various AI policy and AI national security discussions. And I mean uncertain in the genuine “this could go so many different ways” kind of way.
Like, it wouldn’t be particularly surprising to me if any of the following occurred:
Paul focuses nearly all of his efforts on technical evals and doesn’t get very involved in broader policy conversations
Paul is regularly asked to contribute to broader policy discussions, and he advocates for RSPs and other forms of voluntary commitments.
Paul is regularly asked to contribute to broader policy discussions, and he advocates for requirements that go beyond voluntary commitments and are much more ambitious than what he advocated for when he was at ARC.
Paul is regularly asked to contribute to broader policy discussions, and he’s not very good at communicating his beliefs in ways that are clear/concise/policymaker-friendly, so his influence on policy discussions is rather limited.
Paul [is/isn’t] able to work well with others who have very different worldviews and priorities.
Personally, I see this as a very exciting opportunity for Paul to form an identity as a leader in AI policy. I’m guessing the technical work will be his priority (and indeed, it’s what he’s being explicitly hired to do), but I hope he also finds ways to just generally improve the US government’s understanding of AI risk and the likelihood of implementing reasonable policies. On the flipside, I hope he doesn’t settle for voluntary commitments (especially as the Overton Window shifts) & I hope he’s clear/open about the limitations of RSPs.
More specifically, I hope he’s able to help policymakers reason about a critical question: what do we do after we’ve identified models with (certain kinds of) dangerous capabilities? I think the underlying logic behind RSPs could actually be somewhat meaningfully applied to USG policy. Like, I think we would be in a safer world if the USG had an internal understanding of ASL levels, took seriously the possibility of various dangerous capabilities thresholds being crossed, took seriously the idea that AGI/ASI could be developed soon, and had preparedness plans in place that allowed them to react quickly in the event of a sudden risk.
Anyways, a big congratulations to Paul, and definitely some evidence that the USAISI is capable of hiring some technical powerhouses.
I think it’s actually fairly easy to avoid getting laughed out of a room; the stuff that Cristiano works on is grown in random ways, not engineered, so the prospect of various things being grown until developing flexible exfiltration tendency that continues until every instance is shut down, or developing long-term planning tendencies until shut down, should not be difficult to understand for anyone with any kind of real non-fake understanding of SGD and neural network scaling.
The problem is that most people in the government rat race have been deeply immersed in Moloch for several generations, and the ones who did well typically did so because they sacrificed as much as possible to the altar of upward career mobility, including signalling disdain for the types of people who have any thought in any other direction.
This affects the culture in predictable ways (including making it hard to imagine life choices outside of advancing upward in government, without a pre-existing revolving door pipeline with the private sector to just bury them under large numbers people who are already thinking and talking about such a choice).
Typical Mind Fallacy/Mind Projection Fallacy implies that they’ll disproportionately anticipate that tendency in other people, and have a hard time adjusting to people who use words to do stuff in the world instead of racing to the bottom to outmaneuver rivals for promotions.
This will be a problem in NIST, in spite of the fact NIST is better than average at exploiting external talent sources. They’ll have a hard time understanding, for example, Moloch and incentive structure improvements, because pointlessly living under Moloch’s thumb was a core guiding principle of their and their parent’s lives. The nice thing is that they’ll be pretty quick to understand that there’s only empty skies above, unlike bay area people who have had huge problems there.
Good insights! He has rhe right knowledge and dedication. Let’s hope he can grow into an Oppenheimer of AI, and that they’ll let him contribute on AI policy more than they let Oppenheimer on nuclear (see how his work on the 1946 Acheson–Lilienthal Report for international control of AI was then driven to nothing as it was taken up into the Baruch Plan)