I agree with most of this. A lot of our TAIS grantmaking over the last year was to evals grants solicited through this RFP. But I want to make a few points of clarification:
Not all the grants that have been or will be approved in 2024 are on our website. For starters, there are still two months left in the year. But also, there are some grants that have been approved, but haven’t been put on the website yet. So $28 million is an modest underestimate, so it isn’t directly comparable to the 2022/2023 numbers.
I agree that evals don’t create new technological approaches to making AIs safer, but I think there are lots of possible worlds where eval results create more willpower and enthusiasm for putting safeguards in place (especially when those safeguards take work/cost money/etc). Specifically, I think evals can show people what a model is capable of, and what the trend lines are over time, and these effects can (if AIs are increasingly more capable) get people to invest more effort in safeguards. So I don’t agree with the claim that evals won’t “cause any action if they are built”, and I also disagree that “very few TAIS grants are directly focused on making sure systems are aligned / controllable / built safely”.
I appreciate you examining our work and giving your takes!
(I work at Open Phil on TAIS grantmaking)
I agree with most of this. A lot of our TAIS grantmaking over the last year was to evals grants solicited through this RFP. But I want to make a few points of clarification:
Not all the grants that have been or will be approved in 2024 are on our website. For starters, there are still two months left in the year. But also, there are some grants that have been approved, but haven’t been put on the website yet. So $28 million is an modest underestimate, so it isn’t directly comparable to the 2022/2023 numbers.
I agree that evals don’t create new technological approaches to making AIs safer, but I think there are lots of possible worlds where eval results create more willpower and enthusiasm for putting safeguards in place (especially when those safeguards take work/cost money/etc). Specifically, I think evals can show people what a model is capable of, and what the trend lines are over time, and these effects can (if AIs are increasingly more capable) get people to invest more effort in safeguards. So I don’t agree with the claim that evals won’t “cause any action if they are built”, and I also disagree that “very few TAIS grants are directly focused on making sure systems are aligned / controllable / built safely”.
I appreciate you examining our work and giving your takes!