What do you mean “hugely edited”? What other things would you like us to change? If I were starting from scratch I would of course write the post differently but I don’t think it would be worth my time to make major post hoc edits; I would like to focus on follow up posts.
If it’s spontaneous then yeah, I don’t expect it to happen ~ever really. I was mainly thinking about cases where people intentionally train models to scheme.
Agree with this hugely, though I could make a partial defense of the confidence given, but yes I’d like this post to be hugely edited.
What do you mean “hugely edited”? What other things would you like us to change? If I were starting from scratch I would of course write the post differently but I don’t think it would be worth my time to make major post hoc edits; I would like to focus on follow up posts.
Specifically, I wanted the edit to be a clarification that you only have a <0.1% probability on spontaneous scheming ending the world.
Well, I have <0.1% on spontaneous scheming, period. I suspect Nora is similar and just misspoke in that comment.
If it’s spontaneous then yeah, I don’t expect it to happen ~ever really. I was mainly thinking about cases where people intentionally train models to scheme.