I think an important consideration is the degree of catastrophe. Even the asteroid strike, which is catastrophic to many agents on many metrics, is not catastrophic on every metric, not even every metric humans actually care about. An easy example of this is prevention of torture, which the asteroid impact accomplishes quite smoothly, along with almost every other negative goal. The asteroid strike is still very bad for most agents affected, but it could be much, much worse, as with the “evil” utility function you alluded to, which is very bad for humans on every metric, not just positive ones. Calling both of these things a “catastrophe” seems to sweep that difference under the rug.
With this in mind, “catastrophe” as defined here seems to be less about negative impact on utility, and more about wresting of control of utility function away from humans. Which seems bound to happen even in the best case where a FAI takes over. It seems a useful concept if that is what you are getting at but “catastrophe” seems to have confusing connotations, as if a “catastrophe” is necessarily the worst thing possible and should be avoided at all costs. If an antialigned “evil” AI were about to be released with high probability, and you had a paperclip maximizer in a box, releasing the paperclip maximizer would be the best option, even though that moves the chance of catastrophe from high probability to indistinguishable from certainty.
Calling both of these things a “catastrophe” seems to sweep that difference under the rug.
Sure, but just like it makes sense to be able to say that a class of outcomes is “good” without every single such outcome being maximally good, it makes sense to have a concept for catastrophes, even if they’re not literally the worst things possible.
Which seems bound to happen even in the best case where a FAI takes over.
Building a powerful agent helping you get what you want, doesn’t destroy your ability to get what you want. By my definition, that’s not a catastrophe.
as if a “catastrophe” is necessarily the worst thing possible and should be avoided at all costs. If an antialigned “evil” AI were about to be released with high probability, and you had a paperclip maximizer in a box, releasing the paperclip maximizer would be the best option, even though that moves the chance of catastrophe from high probability to indistinguishable from certainty.
Correct. Again, I don’t mean to say that any catastrophe is literally the worst outcome possible.
I think an important consideration is the degree of catastrophe. Even the asteroid strike, which is catastrophic to many agents on many metrics, is not catastrophic on every metric, not even every metric humans actually care about. An easy example of this is prevention of torture, which the asteroid impact accomplishes quite smoothly, along with almost every other negative goal. The asteroid strike is still very bad for most agents affected, but it could be much, much worse, as with the “evil” utility function you alluded to, which is very bad for humans on every metric, not just positive ones. Calling both of these things a “catastrophe” seems to sweep that difference under the rug.
With this in mind, “catastrophe” as defined here seems to be less about negative impact on utility, and more about wresting of control of utility function away from humans. Which seems bound to happen even in the best case where a FAI takes over. It seems a useful concept if that is what you are getting at but “catastrophe” seems to have confusing connotations, as if a “catastrophe” is necessarily the worst thing possible and should be avoided at all costs. If an antialigned “evil” AI were about to be released with high probability, and you had a paperclip maximizer in a box, releasing the paperclip maximizer would be the best option, even though that moves the chance of catastrophe from high probability to indistinguishable from certainty.
Sure, but just like it makes sense to be able to say that a class of outcomes is “good” without every single such outcome being maximally good, it makes sense to have a concept for catastrophes, even if they’re not literally the worst things possible.
Building a powerful agent helping you get what you want, doesn’t destroy your ability to get what you want. By my definition, that’s not a catastrophe.
Correct. Again, I don’t mean to say that any catastrophe is literally the worst outcome possible.