METR seems to imply 167 hours, approximately one working month, is the relevant project length for getting a well-defined, non-messy research task done.
It’s interesting that their doubling time varies between 7 months and 70 days depending on which tasks and which historical time horizon they look at.
For a lower bound estimate, I’d take 70 days doubling time and 167 hrs, and current max task length one hour. In that case, if I’m not mistaken,
2^(t/d) = 167 (t time, d doubling time)
t = d*log(167)/log(2) = (70/365)*log(167)/log(2) = 1.4 yr, or October 2026
For a higher bound estimate, I’d take their 7 months doubling time result and a task of one year, not one month (perhaps optimistic to finish SOTA research work in one month?). That means 167*12=2004 hrs.
t = d*log(2004)/log(2) = (7/12)*log(2004)/log(2) = 6.4 yr, or August 2031
Not unreasonable to expect AI that can autonomously do non-messy tasks in domains with low penalties for wrong answers in between these two dates?
It’s also noteworthy though that timelines for what the paper calls messy work, in the current paradigm, could be a lot longer, or could provide architecture improvements.
Interesting and nice to play with a bit.
METR seems to imply 167 hours, approximately one working month, is the relevant project length for getting a well-defined, non-messy research task done.
It’s interesting that their doubling time varies between 7 months and 70 days depending on which tasks and which historical time horizon they look at.
For a lower bound estimate, I’d take 70 days doubling time and 167 hrs, and current max task length one hour. In that case, if I’m not mistaken,
2^(t/d) = 167 (t time, d doubling time)
t = d*log(167)/log(2) = (70/365)*log(167)/log(2) = 1.4 yr, or October 2026
For a higher bound estimate, I’d take their 7 months doubling time result and a task of one year, not one month (perhaps optimistic to finish SOTA research work in one month?). That means 167*12=2004 hrs.
t = d*log(2004)/log(2) = (7/12)*log(2004)/log(2) = 6.4 yr, or August 2031
Not unreasonable to expect AI that can autonomously do non-messy tasks in domains with low penalties for wrong answers in between these two dates?
It’s also noteworthy though that timelines for what the paper calls messy work, in the current paradigm, could be a lot longer, or could provide architecture improvements.