I think that more engagement in this area is useful, and mostly agree. I’ll point out that I think much of the issue with powerful agents and missed consequences is more usefully captured by work on Goodhart’s law, which is definitely my pet idea, but seems relevant. I’ll self promote shamelessly here.
I think that more engagement in this area is useful, and mostly agree. I’ll point out that I think much of the issue with powerful agents and missed consequences is more usefully captured by work on Goodhart’s law, which is definitely my pet idea, but seems relevant. I’ll self promote shamelessly here.
Technical-ish paper with Scott Garrabrant: https://arxiv.org/abs/1803.04585
A more qualitative argument about multi-agent cases, with some examples of how it’s already failing: https://www.mdpi.com/2504-2289/3/2/21/htm
A hopefully someday accepted / published article on paths to minimize these risks in non-AI systems: https://mpra.ub.uni-muenchen.de/98288/5/MPRA_paper_98288.pdf