An example that springs to my mind is Abram wrote a blog post in 2018 mentioning the “easy problem of wireheading”. He described both the problem and its solution in like one sentence, and then immediately moved on to the harder problems.
Later on, DeepMind did an experiment that (in my assessment) mostly just endorsed what Abram said as being correct.
For the record, I don’t think that particular DeepMind experiment was zero value, for various reasons. But at the same time, I think that Abram wins hands-down on the metric of “progress towards AI alignment per researcher-hour”, and this is true at both the production and consumption end (I can read Abram’s one sentence much much faster than I can skim the DeepMind paper).
If we had a plausible-to-me plan that gets us to safe & beneficial AGI, I would be really enthusiastic about going back and checking all the assumptions with experiments. That’s how you shore up the foundations, flesh out the details, start developing working code and practical expertise, etc. etc. But I don’t think we have such a plan right now.
Also, there are times when it’s totally unclear a priori what an algorithm will do just by thinking about it, and then obviously the experiments are super useful.
But at the end of the day, I feel like there are experiments that are happening not because it’s the optimal thing to do for AI alignment, but rather because there are very strong pro-experiment forces that exist inside CS / ML / AI research in academia and academia-adjacent labs.
An example that springs to my mind is Abram wrote a blog post in 2018 mentioning the “easy problem of wireheading”. He described both the problem and its solution in like one sentence, and then immediately moved on to the harder problems.
Later on, DeepMind did an experiment that (in my assessment) mostly just endorsed what Abram said as being correct.
For the record, I don’t think that particular DeepMind experiment was zero value, for various reasons. But at the same time, I think that Abram wins hands-down on the metric of “progress towards AI alignment per researcher-hour”, and this is true at both the production and consumption end (I can read Abram’s one sentence much much faster than I can skim the DeepMind paper).
If we had a plausible-to-me plan that gets us to safe & beneficial AGI, I would be really enthusiastic about going back and checking all the assumptions with experiments. That’s how you shore up the foundations, flesh out the details, start developing working code and practical expertise, etc. etc. But I don’t think we have such a plan right now.
Also, there are times when it’s totally unclear a priori what an algorithm will do just by thinking about it, and then obviously the experiments are super useful.
But at the end of the day, I feel like there are experiments that are happening not because it’s the optimal thing to do for AI alignment, but rather because there are very strong pro-experiment forces that exist inside CS / ML / AI research in academia and academia-adjacent labs.
That’s a good example, thanks :)
EDIT: To be clear, I don’t agree with
but I do think this is a good example of what someone might mean when they say work is “predictable”.