It seems a substantial drawback that it will be more costly for you to criticize Anthropic in the future.
Many of the people / orgs involved in evals research are also important figures in policy debates. With this incentive Anthropic may gain more ability to control the narrative around AI risks.
It seems a substantial drawback that it will be more costly for you to criticize Anthropic in the future.
As in, if at some point I am currently a contractor with model access (or otherwise have model access via some relationship like this) it will at that point be more costly to criticize Anthropic?
AI labs may provide model access (or other goods), so people who might want to obtain model access might be incentivized to criticize AI labs less.
Is that accurate?
Notably, as described this is not specifically a downside of anything I’m arguing for in my comment or a downside of actually being a contractor. (Unless you think me being a contractor will make me more likely to want model acess for whatever reason.)
I agree that this is a concern in general with researchers who could benefit from various things that AI labs might provide (such as model access). So, this is a downside of research agendas with a dependence on (e.g.) model access.
I think various approaches to mitigate this concern could be worthwhile. (Though I don’t think this is worth getting into in this comment.)
Notably, as described this is not specifically a downside of anything I’m arguing for in my comment or a downside of actually being a contractor.
In your comment you say
For some safety research, it’s helpful to have model access in ways that labs don’t provide externally. Giving employee level access to researchers working at external organizations can allow these researchers to avoid potential conflicts of interest and undue influence from the lab. This might be particularly important for researchers working on RSPs, safety cases, and similar, because these researchers might naturally evolve into third-party evaluators.
Related to undue influence concerns, an unfortunate downside of doing safety research at a lab is that you give the lab the opportunity to control the narrative around the research and use it for their own purposes. This concern seems substantially addressed by getting model access through a lab as an external researcher.
I’m essentially disagreeing with this point. I expect that most of the conflict of interest concerns remain when a big lab is giving access to a smaller org / individual.
(Unless you think me being a contractor will make me more likely to want model access for whatever reason.)
From my perspective the main takeaway from your comment was “Anthropic gives internal model access to external safety researchers.” I agree that once you have already updated on this information, the additional information “I am currently receiving access to Anthropic’s internal models” does not change much. (Although I do expect that establishing the precedent / strengthening the relationships / enjoying the luxury of internal model access, will in fact make you more likely to want model access again in the future).
It seems a substantial drawback that it will be more costly for you to criticize Anthropic in the future.
Many of the people / orgs involved in evals research are also important figures in policy debates. With this incentive Anthropic may gain more ability to control the narrative around AI risks.
As in, if at some point I am currently a contractor with model access (or otherwise have model access via some relationship like this) it will at that point be more costly to criticize Anthropic?
I’m not sure what the confusion is exactly.
If any of
you have a fixed length contract and you hope to have another contract again in the future
you have an indefinite contract and you don’t want them to terminate your relationship
you are some other evals researcher and you hope to gain model access at some point
you may refrain from criticizing Anthropic from now on.
Ok, so the concern is:
Is that accurate?
Notably, as described this is not specifically a downside of anything I’m arguing for in my comment or a downside of actually being a contractor. (Unless you think me being a contractor will make me more likely to want model acess for whatever reason.)
I agree that this is a concern in general with researchers who could benefit from various things that AI labs might provide (such as model access). So, this is a downside of research agendas with a dependence on (e.g.) model access.
I think various approaches to mitigate this concern could be worthwhile. (Though I don’t think this is worth getting into in this comment.)
Yes that’s accurate.
In your comment you say
I’m essentially disagreeing with this point. I expect that most of the conflict of interest concerns remain when a big lab is giving access to a smaller org / individual.
From my perspective the main takeaway from your comment was “Anthropic gives internal model access to external safety researchers.” I agree that once you have already updated on this information, the additional information “I am currently receiving access to Anthropic’s internal models” does not change much. (Although I do expect that establishing the precedent / strengthening the relationships / enjoying the luxury of internal model access, will in fact make you more likely to want model access again in the future).
As in, there aren’t substantial reductions in COI from not being an employee and not having equity? I currently disagree.
Yeah that’s the crux I think. Or maybe we agree but are just using “substantial”/”most” differently.
It mostly comes down to intuitions so I think there probably isn’t a way to resolve the disagreement.