kave comments on Anthropic’s Responsible Scaling Policy & Long-Term Benefit Trust

kave 19 Sep 2023 18:11 UTC
LW: 29 AF: 14
13
AF
As a general matter, Anthropic has consistently found that working with frontier AI models is an essential ingredient in developing new methods to mitigate the risk of AI.
What are some examples of work that is most largeness-loaded and most risk-preventing? My understanding is that interpretability work doesn’t need large models (though I don’t know about things like influence functions). I imagine constitutional AI does. Is that the central example or there are other pieces that are further in this direction?