My own experience is that if-statements are even 3.5′s Achilles heel and 3.7 is somehow worse (when it’s “almost” right, that’s worse than useless, it’s like reviewing pull requests when you don’t know if it’s an adversarial attack or if they mean well but are utterly incompetent in interesting, hypnotizing ways)… and that METR’s baselines more resemble a Skinner box than programming (though many people have that kind of job, I just don’t find the conditions of gig economy as “humane” and representative of what how “value” is actually created), and the sheer disconnect of what I would find “productive”, “useful projects”, “bottlenecks”, and “what I love about my job and what parts I’d be happy to automate” vs the completely different answers on How Much Are LLMs Actually Boosting Real-World Programmer Productivity?, even from people I know personally...
I find this graph indicative of how “value” is defined by the SF investment culture and disruptive economy… and I hope the AI investment bubble will collapse sooner rather than later...
But even if the bubble collapses, automating intelligence will not be undone, it won’t suddenly become “safe”, the incentives to create real AGI instead of overhyped LLMs will still exists—the danger is not in the presented economic curve going up, it’s in what economic actors see as potential, how incentivized are the corporations/governments to search for the thing that is both powerful and dangerous, no?
My own experience is that if-statements are even 3.5′s Achilles heel and 3.7 is somehow worse (when it’s “almost” right, that’s worse than useless, it’s like reviewing pull requests when you don’t know if it’s an adversarial attack or if they mean well but are utterly incompetent in interesting, hypnotizing ways)… and that METR’s baselines more resemble a Skinner box than programming (though many people have that kind of job, I just don’t find the conditions of gig economy as “humane” and representative of what how “value” is actually created), and the sheer disconnect of what I would find “productive”, “useful projects”, “bottlenecks”, and “what I love about my job and what parts I’d be happy to automate” vs the completely different answers on How Much Are LLMs Actually Boosting Real-World Programmer Productivity?, even from people I know personally...
I find this graph indicative of how “value” is defined by the SF investment culture and disruptive economy… and I hope the AI investment bubble will collapse sooner rather than later...
But even if the bubble collapses, automating intelligence will not be undone, it won’t suddenly become “safe”, the incentives to create real AGI instead of overhyped LLMs will still exists—the danger is not in the presented economic curve going up, it’s in what economic actors see as potential, how incentivized are the corporations/governments to search for the thing that is both powerful and dangerous, no?