How much are you thinking about stability under optimization? Most objective catastrophes are also human catastrophes. But if a powerful agent is trying to achieve some goal while avoiding objective catastrophes, it seems like it’s still incentivized to dethrone humans—to cause basically the most human-catastrophic thing that’s not objective-catastrophic.
How much are you thinking about stability under optimization? Most objective catastrophes are also human catastrophes. But if a powerful agent is trying to achieve some goal while avoiding objective catastrophes, it seems like it’s still incentivized to dethrone humans—to cause basically the most human-catastrophic thing that’s not objective-catastrophic.
I’m not thinking of optimizing for “not an objective catastrophe” directly—it’s just a useful concept. The next post covers this.