I think alignment datasets are a very useful part of a portfolio approach to alignment research. Right now I think there are alignment risks/concerns for which datasets like this wouldn’t help, but also there are some that it would help for.
Datasets and benchmarks more broadly are useful for forecasting progress, but this assumes smooth/continuous progress (in general a good assumption—but also good to be wary of cases where this isn’t the case).
Some thoughts from working on generating datasets for research, and using those datasets in research:
Start by building tiny versions of the dataset yourself
It’s good to switch early to paying labelers/contractors to generate and labels—they won’t be perfect at first, so there’s a lot of iterating in clarifying instructions/feedback/etc
It’s best to gather data that you’d want to use for research right away, not for some nebulous possible future research
Getting clean benchmarks that exhibit some well-defined phenomena is useful for academics and grad students
When in doubt, BIG-Bench is a good place to submit these sorts of tiny evaluative datasets
Where possible, experiment with using models to generate more data (e.g. with few-shot or generative modeling on the data you have)
Sometimes a filter is just as good as data (a classifier that distinguishes data inside the desired distribution)
I think this is a great idea, but would be best to start super small. It sounds right now like a huge project plan, but I think it could be road-mapped into something where almost every step along the path produces some valuable input.
Given the amount of funding available from charitable sources for AI alignment research these days, I think a good thing to consider is figuring out how to make instructions for contractors to generate the data, then getting money to hire the contractors and just oversee/manage them. (As opposed to trying to get volunteers to make all the data)
I work on this sort of thing at OpenAI.
I think alignment datasets are a very useful part of a portfolio approach to alignment research. Right now I think there are alignment risks/concerns for which datasets like this wouldn’t help, but also there are some that it would help for.
Datasets and benchmarks more broadly are useful for forecasting progress, but this assumes smooth/continuous progress (in general a good assumption—but also good to be wary of cases where this isn’t the case).
Some thoughts from working on generating datasets for research, and using those datasets in research:
Start by building tiny versions of the dataset yourself
It’s good to switch early to paying labelers/contractors to generate and labels—they won’t be perfect at first, so there’s a lot of iterating in clarifying instructions/feedback/etc
It’s best to gather data that you’d want to use for research right away, not for some nebulous possible future research
Getting clean benchmarks that exhibit some well-defined phenomena is useful for academics and grad students
When in doubt, BIG-Bench is a good place to submit these sorts of tiny evaluative datasets
Where possible, experiment with using models to generate more data (e.g. with few-shot or generative modeling on the data you have)
Sometimes a filter is just as good as data (a classifier that distinguishes data inside the desired distribution)
I think this is a great idea, but would be best to start super small. It sounds right now like a huge project plan, but I think it could be road-mapped into something where almost every step along the path produces some valuable input.
Given the amount of funding available from charitable sources for AI alignment research these days, I think a good thing to consider is figuring out how to make instructions for contractors to generate the data, then getting money to hire the contractors and just oversee/manage them. (As opposed to trying to get volunteers to make all the data)