Yeah, I agree that releasing open-weights non-frontier models doesn’t seem like a frontier capabilities advance.
It does seem potentially like an open-source capabilities advance.
That can be bad in different ways.
Let me pose a couple hypotheticals.
What if frontier models were already capable of causing grave harms to the world if used by bad actors, and it is only the fact that they are kept safety-fine-tuned and restricted behind APIs that is preventing this? In such a case, it’s a dangerous thing to have open-weight models catching up.
What if there is some threshold beyond which a model would be capable enough of recursive self-improvement with sufficient scaffolding and unwise pressure from an incautious user. Again, the frontier labs might well abstain from this course. Especially if they weren’t sure they could trust the new model design created by the current AI. They would likely move slowly and cautiously at least. I would not expect this of the open-source community. They seem focused on pushing the boundaries of agent-scaffolding and incautiously exploring the whatever they can.
So, as we get closer to danger, open-weight models take on more safety significance.
Yeah, there are reasons for caution. I think it makes sense for those concerned or non-concerned to make numerical forecasts about the costs & benefits of such questions, rather than the current state of everyone just comparing their vibes against each other. This generalizes to other questions, like the benefits of interpretability, advances in safety fine-tuning, deep learning science, and agent foundations.
Obviously such numbers aren’t the end-of-the-line, and like in biorisk, sometimes they themselves should be kept secret. But it still seems a great advance.
If anyone would like to collaborate on such a project, my DMs are open (not so say this topic is covered, this isn’t exactly my main wheelhouse).
Yeah, I agree that releasing open-weights non-frontier models doesn’t seem like a frontier capabilities advance. It does seem potentially like an open-source capabilities advance.
That can be bad in different ways. Let me pose a couple hypotheticals.
What if frontier models were already capable of causing grave harms to the world if used by bad actors, and it is only the fact that they are kept safety-fine-tuned and restricted behind APIs that is preventing this? In such a case, it’s a dangerous thing to have open-weight models catching up.
What if there is some threshold beyond which a model would be capable enough of recursive self-improvement with sufficient scaffolding and unwise pressure from an incautious user. Again, the frontier labs might well abstain from this course. Especially if they weren’t sure they could trust the new model design created by the current AI. They would likely move slowly and cautiously at least. I would not expect this of the open-source community. They seem focused on pushing the boundaries of agent-scaffolding and incautiously exploring the whatever they can.
So, as we get closer to danger, open-weight models take on more safety significance.
Yeah, there are reasons for caution. I think it makes sense for those concerned or non-concerned to make numerical forecasts about the costs & benefits of such questions, rather than the current state of everyone just comparing their vibes against each other. This generalizes to other questions, like the benefits of interpretability, advances in safety fine-tuning, deep learning science, and agent foundations.
Obviously such numbers aren’t the end-of-the-line, and like in biorisk, sometimes they themselves should be kept secret. But it still seems a great advance.
If anyone would like to collaborate on such a project, my DMs are open (not so say this topic is covered, this isn’t exactly my main wheelhouse).