Yeah, and then we also want system A to be able to make a system B one step smarter than itself, which remains aligned with system A and with us. This needs to continue safely and successfully until we have a system powerful enough to prevent the rise of unaligned RSI AGI. That seems like a high level of capability to me, and I’m not sure getting there in small steps rather than big ones buys us much.
I think it does buy something. The AI one step after us might be roughly as aligned as us (or a bit less), but noticeably better at figuring out what the heck alignment is and how to ensure it on the next step.
As AI ecosystem self-improves, it will eventually start discovering new physics, more and more rapidly, and this will result in the AI ecosystem having existential safety issues of its own (if the new physics is radical enough, it’s not difficult to imagine the scenarios when everything gets destroyed including all AIs).
So I wonder if early awareness that there are existential safety issues relevant to the well-being of AIs themselves might improve the situation...
Yeah, and then we also want system A to be able to make a system B one step smarter than itself, which remains aligned with system A and with us. This needs to continue safely and successfully until we have a system powerful enough to prevent the rise of unaligned RSI AGI. That seems like a high level of capability to me, and I’m not sure getting there in small steps rather than big ones buys us much.
I think it does buy something. The AI one step after us might be roughly as aligned as us (or a bit less), but noticeably better at figuring out what the heck alignment is and how to ensure it on the next step.
I wonder if the following would help.
As AI ecosystem self-improves, it will eventually start discovering new physics, more and more rapidly, and this will result in the AI ecosystem having existential safety issues of its own (if the new physics is radical enough, it’s not difficult to imagine the scenarios when everything gets destroyed including all AIs).
So I wonder if early awareness that there are existential safety issues relevant to the well-being of AIs themselves might improve the situation...