QURI is focused on ways of advancing forecasting and evaluation for altruistic means. One of the largest challenges we’ve come across has been in how to organize scalable information in ways that could be neatly predicted or evaluated upon. AI Safety Papers is in part a test at how easy and how usable custom web front ends that can help organize large amounts of tabular data. Besides (hopefully) being directly useful, applications like AI Safety Papers might later serve to assist with forecasting or evaluation. This could be aided by things like per-paper comments threads, integrations with forecasting functionality (either forecasting making, viewing, or both), and integrations with evaluations (writing, viewing, or both).
One example future possible project might be something like:
We come up with a rubric for a subset of papers (Say, “Quality”, “Novelty”, and “Importance”)
We find some set of respected researchers to rate a select subset of papers.
We have a separate subset of forecasters (more junior researchers) attempt to make predictions on what the group in step (2) would say, on every paper.
The resulting estimates from (3) could be made available on the website.
Every so often, some papers would be randomly selected to be evaluated by the respected team. After this is done, the predictors who best predicted these results would be appropriately rewarded.
This would be a setup similar to what has been discussed here.
If anyone reading this has suggestions or thoughts on such sorts of arrangements, please do leave comments.
On the relevance for QURI:
QURI is focused on ways of advancing forecasting and evaluation for altruistic means. One of the largest challenges we’ve come across has been in how to organize scalable information in ways that could be neatly predicted or evaluated upon. AI Safety Papers is in part a test at how easy and how usable custom web front ends that can help organize large amounts of tabular data. Besides (hopefully) being directly useful, applications like AI Safety Papers might later serve to assist with forecasting or evaluation. This could be aided by things like per-paper comments threads, integrations with forecasting functionality (either forecasting making, viewing, or both), and integrations with evaluations (writing, viewing, or both).
One example future possible project might be something like:
We come up with a rubric for a subset of papers (Say, “Quality”, “Novelty”, and “Importance”)
We find some set of respected researchers to rate a select subset of papers.
We have a separate subset of forecasters (more junior researchers) attempt to make predictions on what the group in step (2) would say, on every paper.
The resulting estimates from (3) could be made available on the website.
Every so often, some papers would be randomly selected to be evaluated by the respected team. After this is done, the predictors who best predicted these results would be appropriately rewarded.
This would be a setup similar to what has been discussed here.
If anyone reading this has suggestions or thoughts on such sorts of arrangements, please do leave comments.