Possibly offer a prize on formalizing and/or distilling the argument for deception (Also its constituents i.e. gradient hacking, situational awareness, non-myopia)
How should we model software progress? In particular, what is the right function for modeling short-term return on investment to algorithmic progress?
My guess is that most researchers with short timelines think, as I do, that there’s lots of low-hanging fruit here. Funders may underestimate the prevalence of this opinion, since most safety researchers do not talk about details here to avoid capabilities acceleration.
This is an empirical question, so I may be missing some key points. Anyway here are a few:
My above points on Ajeya anchors and semi-informative priors
Or, put another way, why reject Daniel’s post?
Can deception precede economically TAI?
Possibly offer a prize on formalizing and/or distilling the argument for deception (Also its constituents i.e. gradient hacking, situational awareness, non-myopia)
How should we model software progress? In particular, what is the right function for modeling short-term return on investment to algorithmic progress?
My guess is that most researchers with short timelines think, as I do, that there’s lots of low-hanging fruit here. Funders may underestimate the prevalence of this opinion, since most safety researchers do not talk about details here to avoid capabilities acceleration.