I would guess it does somewhat exacerbate risk. I think it’s unlikely (~15%) that alignment is easy enough that prosaic techniques even could suffice, but in those worlds I expect things go well mostly because the behavior of powerful models is non-trivially influenced/constrained by their training. In which case I do expect there’s more room for things to go wrong, the more that training is for lethality/adversariality.
Given the present state of atheoretical confusion about alignment, I feel wary of confidently dismissing these sorts of basic, obvious-at-first-glance arguments about risk—like e.g., “all else equal, probably we should expect more killing people-type problems from models trained to kill people”—without decently strong countervailing arguments.
I’m curious if “trusted” in this sense basically just means “aligned”—or like, the superset of that which also includes “unaligned yet too dumb to cause harm” and “unaligned yet prevented from causing harm”—or whether you mean something more specific? E.g., are you imagining that some powerful unconstrained systems are trusted yet unaligned, or vice versa?