Quick thoughts on a database for pre-registering empirical AI safety experiments
Keywords to help others searching to see if this has been discussed: pre-register, negative results, null results, publication bias in AI alignment.
The basic idea:
Many scientific fields are plagued with publication bias where researchers only write up and publish “positive results,” where they find a significant effect or their method works. We might want to avoid this happening in empirical AI safety. We would do this with a two fold approach: a venue that purposefully accepts negative and neutral results, and a pre-registration process for submitting research protocols ahead of time, ideally linked to the journal so that researchers can get a guarantee that their results will be published regardless of the result.
Some potential upsides:
Could allow better coordination by giving researchers more information about what to focus on based on what has already been investigated. Hypothetically, this should speed up research by avoiding redundancy.
Safely deploying AI systems may require complex forecasting of their behavior; while it would be intractable for a human to read and aggregate information across many thousands of studies, automated researchers may be able to consume and process information at this scale. Having access to negative results from minimal experiments may be helpful for this task. That’s a specific use case, but the general thing here is just that publication bias makes it hard to figure out what is true compared to if all results.
Drawbacks and challenges:
Decent chance the quality of work is sufficiently poor so as to not be useful; we would need monitoring/review to avoid this. Specifically, whether a past project failed at a technique provides close to no evidence if you don’t think the project was executed competently, so you want a competence bar for accepting work.
For the people who might be in a good position to review research, this may be a bad use of their time
Some AI safety work is going to be private and not captured by this registry.
This registry could increase the prominence of info-hazardous research. Either that research is included in the registry, or it’s omitted. If it’s omitted, this could end up looking like really obvious holes in the research landscape, so a novice could find those research directions by filling in the gaps (effectively doing away with whatever security-through-obscurity info-hazardous research had going for it). That compares to the current state where there isn’t a clear research landscape with obvious holes, so I expect this argument proves too much in that it suggests clarity on the research landscape would be bad. Info-hazards are potentially an issue for pre-registration as well, as researchers shouldn’t be locked into publishing dangerous results (and failing to publish after pre-registration may give too much information to others).
This registry going well requires considerable buy in from the relevant researchers; it’s a coordination problem and even the AI safety community seems to be getting eaten by the Moloch of publication bias
Could cause more correlated research bets due to anchoring on what others are working on or explicitly following up on their work. On the other hand, it might lead to less correlated research bets because we can avoid all trying the same bad idea first.
It may be too costly to write up negative results, especially if they are being published in a venue that relevant researchers don’t regard highly. It may be too costly in terms of the individual researcher’s effort, but it could also be too costly even at a community level if the journal / pre-registry doesn’t end up providing much value
Overall, this doesn’t seem like a very good idea because of the costs and likelihood of success. There is plausibly a low cost version that would still get some of the benefit. Like higher-status researchers publicly advocating for publishing negative results, and others in the community discussing the benefits of doing so. Another low-cost solution would be small grants for researchers to write up negative results.
Thanks to Isaac Dunn and Lucia Quirke for discussion / feedback during SERI MATS 4.0
Quick thoughts on a database for pre-registering empirical AI safety experiments
Keywords to help others searching to see if this has been discussed: pre-register, negative results, null results, publication bias in AI alignment.
The basic idea:
Many scientific fields are plagued with publication bias where researchers only write up and publish “positive results,” where they find a significant effect or their method works. We might want to avoid this happening in empirical AI safety. We would do this with a two fold approach: a venue that purposefully accepts negative and neutral results, and a pre-registration process for submitting research protocols ahead of time, ideally linked to the journal so that researchers can get a guarantee that their results will be published regardless of the result.
Some potential upsides:
Could allow better coordination by giving researchers more information about what to focus on based on what has already been investigated. Hypothetically, this should speed up research by avoiding redundancy.
Safely deploying AI systems may require complex forecasting of their behavior; while it would be intractable for a human to read and aggregate information across many thousands of studies, automated researchers may be able to consume and process information at this scale. Having access to negative results from minimal experiments may be helpful for this task. That’s a specific use case, but the general thing here is just that publication bias makes it hard to figure out what is true compared to if all results.
Drawbacks and challenges:
Decent chance the quality of work is sufficiently poor so as to not be useful; we would need monitoring/review to avoid this. Specifically, whether a past project failed at a technique provides close to no evidence if you don’t think the project was executed competently, so you want a competence bar for accepting work.
For the people who might be in a good position to review research, this may be a bad use of their time
Some AI safety work is going to be private and not captured by this registry.
This registry could increase the prominence of info-hazardous research. Either that research is included in the registry, or it’s omitted. If it’s omitted, this could end up looking like really obvious holes in the research landscape, so a novice could find those research directions by filling in the gaps (effectively doing away with whatever security-through-obscurity info-hazardous research had going for it). That compares to the current state where there isn’t a clear research landscape with obvious holes, so I expect this argument proves too much in that it suggests clarity on the research landscape would be bad. Info-hazards are potentially an issue for pre-registration as well, as researchers shouldn’t be locked into publishing dangerous results (and failing to publish after pre-registration may give too much information to others).
This registry going well requires considerable buy in from the relevant researchers; it’s a coordination problem and even the AI safety community seems to be getting eaten by the Moloch of publication bias
Could cause more correlated research bets due to anchoring on what others are working on or explicitly following up on their work. On the other hand, it might lead to less correlated research bets because we can avoid all trying the same bad idea first.
It may be too costly to write up negative results, especially if they are being published in a venue that relevant researchers don’t regard highly. It may be too costly in terms of the individual researcher’s effort, but it could also be too costly even at a community level if the journal / pre-registry doesn’t end up providing much value
Overall, this doesn’t seem like a very good idea because of the costs and likelihood of success. There is plausibly a low cost version that would still get some of the benefit. Like higher-status researchers publicly advocating for publishing negative results, and others in the community discussing the benefits of doing so. Another low-cost solution would be small grants for researchers to write up negative results.
Thanks to Isaac Dunn and Lucia Quirke for discussion / feedback during SERI MATS 4.0