but I’m a bit disappointed that x-risk-motivated researchers seem to be taking the “safety”/”harm” framing of refusals seriously
I’d say a more charitable interpretation is that it is a useful framing: both in terms of a concrete thing one could use as scaffolding for alignment-as-defined-by-Zack research progress, and also a thing that is financially advantageous to focus on since frontier labs are strongly incentivized to care about this.
I’d say a more charitable interpretation is that it is a useful framing: both in terms of a concrete thing one could use as scaffolding for alignment-as-defined-by-Zack research progress, and also a thing that is financially advantageous to focus on since frontier labs are strongly incentivized to care about this.