But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies?
To a large extent “ML” refers to a few particular technologies that have the form “try a bunch of things and do more of what works” or “consider a bunch of things and then do the one that is predicted to work.”
That is true but I think of this as a limitation of contemporary ML approaches rather than a fundamental property of advanced AI.
I’m mostly aiming to describe what I think is in fact most likely to go wrong, I agree it’s not a general or necessary feature of AI that its comparative advantage is optimizing easy-to-measure goals.
(I do think there is some real sense in which getting over this requires “solving alignment.”)
To a large extent “ML” refers to a few particular technologies that have the form “try a bunch of things and do more of what works” or “consider a bunch of things and then do the one that is predicted to work.”
Why not “try a bunch of measurements and figure out which one generalizes best” or “consider a bunch of things and then do the one that is predicted to work according to the broadest variety of ML-generated measurements”? (I expect there’s already some research corresponding to these suggestions, but more could be valuable?)
To a large extent “ML” refers to a few particular technologies that have the form “try a bunch of things and do more of what works” or “consider a bunch of things and then do the one that is predicted to work.”
I’m mostly aiming to describe what I think is in fact most likely to go wrong, I agree it’s not a general or necessary feature of AI that its comparative advantage is optimizing easy-to-measure goals.
(I do think there is some real sense in which getting over this requires “solving alignment.”)
Why not “try a bunch of measurements and figure out which one generalizes best” or “consider a bunch of things and then do the one that is predicted to work according to the broadest variety of ML-generated measurements”? (I expect there’s already some research corresponding to these suggestions, but more could be valuable?)