I’ve just now found my way to this post, from links in several of your more recent posts, and I’m curious as to how this fits in with more recent concepts and thinking from yourself and others.
Firstly, in terms of Garrabrant’s taxonomy, I take it that the “evil AI” scenario could be considered a case of adversarial Goodhart, and the siren and marketing worlds without builders could be considered cases of regressional and/or extremal Goodhart. Does that sound right?
Secondly, would you still say that these scenarios demonstrate reasons to avoid optimising (and to instead opt for something like satisficing or constrained search)? It seems to me—though I’m fairly unsure about this—that your more recent writing on Goodhart-style problems suggests that you think we can deal with such problems to the best of our ability by just modelling everything we must already know about our uncertainty and about our preferences (e.g., that they have diminishing returns). Is that roughly right? If so, would you now view these siren and marketing worlds not as arguments against optimisation, but rather as strong demonstrations that naively optimising could be disastrous, and that carefully modelling everything we know about our uncertainty and preferences is really important?
that your more recent writing on Goodhart-style problems suggests that you think we can deal with such problems to the best of our ability by just modelling everything we must already know about our uncertainty and about our preferences (e.g., that they have diminishing returns).
To a large extent I do, but there may be some residual effects similar to the above, so some anti-optimising pressure might still be useful.
I’ve just now found my way to this post, from links in several of your more recent posts, and I’m curious as to how this fits in with more recent concepts and thinking from yourself and others.
Firstly, in terms of Garrabrant’s taxonomy, I take it that the “evil AI” scenario could be considered a case of adversarial Goodhart, and the siren and marketing worlds without builders could be considered cases of regressional and/or extremal Goodhart. Does that sound right?
Secondly, would you still say that these scenarios demonstrate reasons to avoid optimising (and to instead opt for something like satisficing or constrained search)? It seems to me—though I’m fairly unsure about this—that your more recent writing on Goodhart-style problems suggests that you think we can deal with such problems to the best of our ability by just modelling everything we must already know about our uncertainty and about our preferences (e.g., that they have diminishing returns). Is that roughly right? If so, would you now view these siren and marketing worlds not as arguments against optimisation, but rather as strong demonstrations that naively optimising could be disastrous, and that carefully modelling everything we know about our uncertainty and preferences is really important?
To a large extent I do, but there may be some residual effects similar to the above, so some anti-optimising pressure might still be useful.