I am sympathetic to “try to figure out how to use these models to make progress on alignment and even better control”, but that feels different from “reducing the risk associated with deploying these models” (though maybe it isn’t and that’s what you mean).
I think of “get use out of the models” and “ensure they can’t cause massive harm” are somewhat separate problems with somewhat overlapping techniques. I think they’re both worth working on.
I think of “get use out of the models” and “ensure they can’t cause massive harm” are somewhat separate problems with somewhat overlapping techniques. I think they’re both worth working on.