I think that the debate around the incentives to make aligned systems is very interesting, and I’m curious if Buck and Rohin formalize a bet around it afterwards.
I feel like Rohin point of view compared to Buck is that people and companies are in general more responsible, in that they are willing to pay extra costs to ensure safety—not necessarily out of a solution to a race-to-the-bottom situation. Is there another source of disagreement, conditional on convergence on the above?
I think there are more, though I don’t know what they are. For example, I think that people will have incentives to ensure alignment (most obviously, AI researchers don’t want to destroy the world), whereas I would guess Buck is less optimistic about it.
If you wouldn’t think that AI researchers care that much about destroying the world, what else makes you optimistic that there will be enough incentives to ensure alignment? Does it all go back to people in relevant power generally caring about safety and taking it seriously?
what else makes you optimistic that there will be enough incentives to ensure alignment?
Well, before you build superintelligent systems that could destroy the world, you probably build subhuman AI systems that do economically useful tasks (e.g. a personal assistant that schedules meetings, books flights, etc). There’s an economic incentive to ensure that those AI systems are doing what their users want, which in turn looks like it incentivizes at least outer alignment work, and probably also inner alignment (to the extent that it’s a problem).
I think that the debate around the incentives to make aligned systems is very interesting, and I’m curious if Buck and Rohin formalize a bet around it afterwards.
I feel like Rohin point of view compared to Buck is that people and companies are in general more responsible, in that they are willing to pay extra costs to ensure safety—not necessarily out of a solution to a race-to-the-bottom situation. Is there another source of disagreement, conditional on convergence on the above?
I think there are more, though I don’t know what they are. For example, I think that people will have incentives to ensure alignment (most obviously, AI researchers don’t want to destroy the world), whereas I would guess Buck is less optimistic about it.
If you wouldn’t think that AI researchers care that much about destroying the world, what else makes you optimistic that there will be enough incentives to ensure alignment? Does it all go back to people in relevant power generally caring about safety and taking it seriously?
Well, before you build superintelligent systems that could destroy the world, you probably build subhuman AI systems that do economically useful tasks (e.g. a personal assistant that schedules meetings, books flights, etc). There’s an economic incentive to ensure that those AI systems are doing what their users want, which in turn looks like it incentivizes at least outer alignment work, and probably also inner alignment (to the extent that it’s a problem).