A few people recently have asked me for my take on ARC evals, and so I’ve aggregated some of my responses here:
- I don’t have strong takes on ARC Evals, mostly on account of not thinking about it deeply. - Part of my read is that they’re trying to, like, get a small dumb minimal version of a thing up so they can scale it to something real. This seems good to me. - I am wary of people in our community inventing metrics that Really Should Not Be Optimized and handing them to a field that loves optimizing metrics. - I expect there are all sorts of issues that would slip past them, and I’m skeptical that the orgs-considering-deployments would actually address those issues meaningfully if issues were detected ([cf](https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hard)). - Nevertheless, I think that some issues can be caught, and attempting to catch them (and to integrate with leading labs, and make “do some basic checks for danger” part of their deployment process) is a step up from doing nothing. - I have not tried to come up with better ideas myself.
Overall, I’m generally enthusiastic about the project of getting people who understand some of the dangers into the deployment-decision loop, looking for advance warning signs.
A few people recently have asked me for my take on ARC evals, and so I’ve aggregated some of my responses here:
- I don’t have strong takes on ARC Evals, mostly on account of not thinking about it deeply.
- Part of my read is that they’re trying to, like, get a small dumb minimal version of a thing up so they can scale it to something real. This seems good to me.
- I am wary of people in our community inventing metrics that Really Should Not Be Optimized and handing them to a field that loves optimizing metrics.
- I expect there are all sorts of issues that would slip past them, and I’m skeptical that the orgs-considering-deployments would actually address those issues meaningfully if issues were detected ([cf](https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hard)).
- Nevertheless, I think that some issues can be caught, and attempting to catch them (and to integrate with leading labs, and make “do some basic checks for danger” part of their deployment process) is a step up from doing nothing.
- I have not tried to come up with better ideas myself.
Overall, I’m generally enthusiastic about the project of getting people who understand some of the dangers into the deployment-decision loop, looking for advance warning signs.