I’d like to publicly preregister an opinion. It’s not worth making a full post because it doesn’t introduce any new arguments, so this seems like a fine place to put it.
I’m open to the possibility of short timelines on risks from language models. Language is a highly generalizable domain that’s seen rapid progress shattering expectations of slower timelines for several years in a row now. The self-supervised pretraining objective means that data is not a constraint (though it could be for language agents, tbd), and the market seems optimistic about business applications of language models.
While I would bet against (~80%) language models pushing annual GDP growth above 20% in the next 10 years, I strongly expect (~80%) risks from AI persuasion to materialize (e.g. becomes a mainstream topic of discussion, influence major political outcomes in the next 10 years) and I’m concerned (~20%) about tail risks from power-seeking LM agents (mainly hacking, but also financial trading, impersonation, or others). I’d be interested in (and should spend some time on) making clear falsifiable predictions here.
Credit to “What 2026 Looks Like” and “It Looks Like You’re Trying To Take Over The World” for making this case well before I believed it was possible. I’m also influenced by the widespread interest in LMs from AI safety grantmakers and researchers. This has been my belief for a few months, as I noted here, and I’ve taken action by working on LM truthfulness, which I expect to be most useful in scenarios of fast LM growth. (Though I don’t think it will substantially combat power-seeking LM agents, and I’m still learning about other research directions that might be more valuable.)
I’d like to publicly preregister an opinion. It’s not worth making a full post because it doesn’t introduce any new arguments, so this seems like a fine place to put it.
I’m open to the possibility of short timelines on risks from language models. Language is a highly generalizable domain that’s seen rapid progress shattering expectations of slower timelines for several years in a row now. The self-supervised pretraining objective means that data is not a constraint (though it could be for language agents, tbd), and the market seems optimistic about business applications of language models.
While I would bet against (~80%) language models pushing annual GDP growth above 20% in the next 10 years, I strongly expect (~80%) risks from AI persuasion to materialize (e.g. becomes a mainstream topic of discussion, influence major political outcomes in the next 10 years) and I’m concerned (~20%) about tail risks from power-seeking LM agents (mainly hacking, but also financial trading, impersonation, or others). I’d be interested in (and should spend some time on) making clear falsifiable predictions here.
Credit to “What 2026 Looks Like” and “It Looks Like You’re Trying To Take Over The World” for making this case well before I believed it was possible. I’m also influenced by the widespread interest in LMs from AI safety grantmakers and researchers. This has been my belief for a few months, as I noted here, and I’ve taken action by working on LM truthfulness, which I expect to be most useful in scenarios of fast LM growth. (Though I don’t think it will substantially combat power-seeking LM agents, and I’m still learning about other research directions that might be more valuable.)