I’m using “catastrophic” in the technical sense of “unacceptably bad even if it happens very rarely, and even if the AI does what you wanted the rest of the time”, rather than “very bad thing that happens because of AI”, apologies if this was confusing.
My guess is that you will wildly disagree with the frame I’m going to use here, but I’ll just spell it out anyway: I’m interested in “catastrophes” as a remaining problem after you have solved the scalable oversight problem. If your action is able to do one of these “positive-sum” pivotal acts in a single action, and you haven’t already lost control, then you can use your overseer to oversee the AI as it takes actions, and you by assumption only have to watch it for a small number of actions (maybe I want to say episodes rather than actions) before it’s done some crazy powerful stuff and saved the world. So I think I stand by the claim that those pivotal acts aren’t where much of the x-risk from AI catastrophic action (in the specific sense I’m using) comes from.
Thanks again for your thoughts here, they clarified several things for me.
I’m using “catastrophic” in the technical sense of “unacceptably bad even if it happens very rarely, and even if the AI does what you wanted the rest of the time”, rather than “very bad thing that happens because of AI”, apologies if this was confusing.
My guess is that you will wildly disagree with the frame I’m going to use here, but I’ll just spell it out anyway: I’m interested in “catastrophes” as a remaining problem after you have solved the scalable oversight problem. If your action is able to do one of these “positive-sum” pivotal acts in a single action, and you haven’t already lost control, then you can use your overseer to oversee the AI as it takes actions, and you by assumption only have to watch it for a small number of actions (maybe I want to say episodes rather than actions) before it’s done some crazy powerful stuff and saved the world. So I think I stand by the claim that those pivotal acts aren’t where much of the x-risk from AI catastrophic action (in the specific sense I’m using) comes from.
Thanks again for your thoughts here, they clarified several things for me.