Pattern comments on [AN #153]: Experiments that demonstrate failures of objective robustness

Pattern 27 Jun 2021 17:31 UTC
2 points
Response: If we don’t start now, then in the short term, companies will deploy products that optimize simple objectives like revenue and engagement, which could be improved by alignment work. In the long term, it is plausible that alignment is very hard, such that we need many conceptual advances that we need to start on now to have them ready by the point that we feel obligated to use powerful AI systems. In addition, empirically there seem to be many alignment approaches that aren’t bottlenecked by the capabilities of models—see for example this post (AN #141).
Does anyone remember the name of the story about the company that wasn’t aligned and the superfactories?