I spent a few hours going through Open Philanthropy (OP)’s grant database. The main findings were:
Open Philanthropy has made $28 million grants for Technical AI Safety (TAIS) in 2024
68% of these are focused on evaluations / benchmarking. The rest is split between interpretability, robustness, value alignment, forecasting, field building and other approaches.
OP funding for TAIS has fallen from a peak in 2022
Excluding funding for evaluations, TAIS funding has fallen by ~80% since 2022.
A majority of TAIS funding is focused on “meta” rather than “direct” safety approaches
My overall takeaway was that very few TAIS grants are directly focused on making sure systems are aligned / controllable / built safely.[1]
Classified grants as either “Policy” or “Non-Policy”
Within “Non-Policy”, classified grants by different focus areas (e.g. evaluations, interpretability, Field building, “Multiple” and “Other”)
In most cases I classified just from the grant name; occasionally I dug for a bit more info. There are definitely errors, and cases where grants could be more clearly specifcied.
Combined focus areas into “Clusters”—“Empirical”, “Theory”, “Forecasting & Evaluating”, “Fieldbuilding” and “Other”
Brief analysis of OP Technical AI Safety Funding
TL;DR
I spent a few hours going through Open Philanthropy (OP)’s grant database. The main findings were:
Open Philanthropy has made $28 million grants for Technical AI Safety (TAIS) in 2024
68% of these are focused on evaluations / benchmarking. The rest is split between interpretability, robustness, value alignment, forecasting, field building and other approaches.
OP funding for TAIS has fallen from a peak in 2022
Excluding funding for evaluations, TAIS funding has fallen by ~80% since 2022.
A majority of TAIS funding is focused on “meta” rather than “direct” safety approaches
My overall takeaway was that very few TAIS grants are directly focused on making sure systems are aligned / controllable / built safely.[1]
Method
I:
Downloaded OP’s list of grants
Filtered for “Potential Risks from Advanced AI”
Classified grants as either “Policy” or “Non-Policy”
Within “Non-Policy”, classified grants by different focus areas (e.g. evaluations, interpretability, Field building, “Multiple” and “Other”)
In most cases I classified just from the grant name; occasionally I dug for a bit more info. There are definitely errors, and cases where grants could be more clearly specifcied.
Combined focus areas into “Clusters”—“Empirical”, “Theory”, “Forecasting & Evaluating”, “Fieldbuilding” and “Other”
Created charts
Results
Grants by Research Agenda
Grants by Cluster
Full data available here
Key Findings
(1) Evaluations & Benchmarking make up 2/3rds of all OP TAIS funding in 2024
Most of these grants are related to the RFP on LLM Benchmarks.
(2) Excluding Evaluations & Benchmarking, OP grants for TAIS have fallen significantly
2022 Funding (excluding evaluations): $62,089,504
2023 Funding (excluding evaluations): $43,417,089
2024 Funding (projected, excluding evaluations): $10,808,390
82.6% reduction vs 2022
(3) Most TAIS funding is focused on “investment” rather than direct approaches to AI safety
I classify grants into two broad buckets:
“Direct”—grants for research agendas which aim to improve safety today (e.g. Interpretability, Control, Robustness, Value Alignment, Theory)
“Investment”—grants which pay off through future impact—e.g. Field building, Talent development, reducing uncertainty (via Forecasting & Evaluation)
More cynically: I worry the heavy focus on evaluations will give us a great understanding of how / when scary AI systems emerge. But that won’t (a) prevent the scary AI systems being built, or (b) cause any action if they are built (see also Would catching your AIs trying to escape convince AI developers to slow down or undeploy?)