Vladimir_Nesov comments on Andrew Burns’s Shortform

Vladimir_Nesov 13 Jun 2024 16:17 UTC
2 points
0
The stakes with open weights for current models are much lower than for hypothetical long-horizon capable models, where removal of safety tuning becomes a stronger argument. The major effects with current models are wide availability for post-training and interpretability research, and feeding the norm of publishing weights that might persist with future dangerous models.