I think it’s pretty realistic to have large-ish (say 20+ FTE at leading labs?) adversarial evaluation teams within 10 years, and much larger seems possible if it actually looks useful. Part of why it’s unrealistic is just that this is a kind of random and specific story and it would more likely be mixed in a complicated way with other roles etc.
If AI is exciting as you are forecasting then it’s pretty likely that labs are receptive to building those teams and hiring a lot of people, so the main question is whether safety-concerned people do a good enough job of scaling up those efforts, getting good at doing the work, recruiting and training more folks, and arguing for / modeling why this is useful and can easily fit into a rapidly-scaling AI lab. (10 years is also a relatively long lead time to get hired and settle in at a lab.)
I think the most likely reason this doesn’t happen in the 10-year world is just that there’s too many other appealing aspects of the ideal world and people who care about alignment will focus attention on making other ones happen (and some of them might just be much better ideas than this one). But if this was all we had to do I would feel extremely optimistic about making it happen.
I feel like this is mostly about technical work rather than AI governance.
Nice. I’m tentatively excited about this… are there any backfire risks? My impression was that the AI governance people didn’t know what to push for because of massive strategic uncertainty. But this seems like a good candidate for something they can do that is pretty likely to be non-negative? Maybe the idea is that if we think more we’ll find even better interventions and political capital should be conserved until then?
I think it’s pretty realistic to have large-ish (say 20+ FTE at leading labs?) adversarial evaluation teams within 10 years, and much larger seems possible if it actually looks useful. Part of why it’s unrealistic is just that this is a kind of random and specific story and it would more likely be mixed in a complicated way with other roles etc.
If AI is exciting as you are forecasting then it’s pretty likely that labs are receptive to building those teams and hiring a lot of people, so the main question is whether safety-concerned people do a good enough job of scaling up those efforts, getting good at doing the work, recruiting and training more folks, and arguing for / modeling why this is useful and can easily fit into a rapidly-scaling AI lab. (10 years is also a relatively long lead time to get hired and settle in at a lab.)
I think the most likely reason this doesn’t happen in the 10-year world is just that there’s too many other appealing aspects of the ideal world and people who care about alignment will focus attention on making other ones happen (and some of them might just be much better ideas than this one). But if this was all we had to do I would feel extremely optimistic about making it happen.
I feel like this is mostly about technical work rather than AI governance.
Nice. I’m tentatively excited about this… are there any backfire risks? My impression was that the AI governance people didn’t know what to push for because of massive strategic uncertainty. But this seems like a good candidate for something they can do that is pretty likely to be non-negative? Maybe the idea is that if we think more we’ll find even better interventions and political capital should be conserved until then?