The stakes with open weights for current models are much lower than for hypothetical long-horizon capable models, where removal of safety tuning becomes a stronger argument. The major effects with current models are wide availability for post-training and interpretability research, and feeding the norm of publishing weights that might persist with future dangerous models.
The stakes with open weights for current models are much lower than for hypothetical long-horizon capable models, where removal of safety tuning becomes a stronger argument. The major effects with current models are wide availability for post-training and interpretability research, and feeding the norm of publishing weights that might persist with future dangerous models.