I think… agree denotationally and (lean towards) disagreeing connotationally? (like, seems like this is implying “and because he doesn’t seem like he obviously has coherent views on alignment-in-particular, it’s not worth arguing the object level?”)
(to be clear, I don’t super expect this post to affect Dario’s decisionmaking models, esp. directly. I do have at least some hope for Anthropic employees to engage with these sorts of models/arguments, and my sense from talking them is that a lot of the LW-flavored arguments have often missed their cruxes)
No, I agree it’s worth arguing the object level. I just disagree that Dario seems to be “reasonably earnestly trying to do good things,” and I think this object-level consideration seems relevant (e.g., insofar as you take Anthropic’s safety strategy to rely on the good judgement of their staff).
I think (moderately likely, though not super confident) it makes more sense to model Dario as:
“a person who actually is quite worried about misuse, and is making significant strategic decisions around that (and doesn’t believe alignment is that hard)”
than as “a generic CEO who’s just generally following incentives and spinning narrative post-hoc rationalizations.”
Yeah, I buy that he cares about misuse. But I wouldn’t quite use the word “believe,” personally, about his acting as though alignment is easy—I think if he had actual models or arguments suggesting that, he probably would have mentioned them by now.
I think… agree denotationally and (lean towards) disagreeing connotationally? (like, seems like this is implying “and because he doesn’t seem like he obviously has coherent views on alignment-in-particular, it’s not worth arguing the object level?”)
(to be clear, I don’t super expect this post to affect Dario’s decisionmaking models, esp. directly. I do have at least some hope for Anthropic employees to engage with these sorts of models/arguments, and my sense from talking them is that a lot of the LW-flavored arguments have often missed their cruxes)
No, I agree it’s worth arguing the object level. I just disagree that Dario seems to be “reasonably earnestly trying to do good things,” and I think this object-level consideration seems relevant (e.g., insofar as you take Anthropic’s safety strategy to rely on the good judgement of their staff).
I think (moderately likely, though not super confident) it makes more sense to model Dario as:
“a person who actually is quite worried about misuse, and is making significant strategic decisions around that (and doesn’t believe alignment is that hard)”
than as “a generic CEO who’s just generally following incentives and spinning narrative post-hoc rationalizations.”
Yeah, I buy that he cares about misuse. But I wouldn’t quite use the word “believe,” personally, about his acting as though alignment is easy—I think if he had actual models or arguments suggesting that, he probably would have mentioned them by now.
I don’t particularly disagree with the first half, but your second sentence isn’t really a crux for me for the first part.