If you wouldn’t think that AI researchers care that much about destroying the world, what else makes you optimistic that there will be enough incentives to ensure alignment? Does it all go back to people in relevant power generally caring about safety and taking it seriously?
what else makes you optimistic that there will be enough incentives to ensure alignment?
Well, before you build superintelligent systems that could destroy the world, you probably build subhuman AI systems that do economically useful tasks (e.g. a personal assistant that schedules meetings, books flights, etc). There’s an economic incentive to ensure that those AI systems are doing what their users want, which in turn looks like it incentivizes at least outer alignment work, and probably also inner alignment (to the extent that it’s a problem).
If you wouldn’t think that AI researchers care that much about destroying the world, what else makes you optimistic that there will be enough incentives to ensure alignment? Does it all go back to people in relevant power generally caring about safety and taking it seriously?
Well, before you build superintelligent systems that could destroy the world, you probably build subhuman AI systems that do economically useful tasks (e.g. a personal assistant that schedules meetings, books flights, etc). There’s an economic incentive to ensure that those AI systems are doing what their users want, which in turn looks like it incentivizes at least outer alignment work, and probably also inner alignment (to the extent that it’s a problem).