edoarad comments on AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah

edoarad 18 Apr 2020 18:43 UTC
3 points
I think that the debate around the incentives to make aligned systems is very interesting, and I’m curious if Buck and Rohin formalize a bet around it afterwards.
I feel like Rohin point of view compared to Buck is that people and companies are in general more responsible, in that they are willing to pay extra costs to ensure safety—not necessarily out of a solution to a race-to-the-bottom situation. Is there another source of disagreement, conditional on convergence on the above?
- Rohin Shah 19 Apr 2020 4:13 UTC
  2 points
  Parent
  I think there are more, though I don’t know what they are. For example, I think that people will have incentives to ensure alignment (most obviously, AI researchers don’t want to destroy the world), whereas I would guess Buck is less optimistic about it.
  - edoarad 19 Apr 2020 6:00 UTC
    1 point
    Parent
    If you wouldn’t think that AI researchers care that much about destroying the world, what else makes you optimistic that there will be enough incentives to ensure alignment? Does it all go back to people in relevant power generally caring about safety and taking it seriously?
    - Rohin Shah 19 Apr 2020 19:08 UTC
      2 points
      Parent
      what else makes you optimistic that there will be enough incentives to ensure alignment?
      Well, before you build superintelligent systems that could destroy the world, you probably build subhuman AI systems that do economically useful tasks (e.g. a personal assistant that schedules meetings, books flights, etc). There’s an economic incentive to ensure that those AI systems are doing what their users want, which in turn looks like it incentivizes at least outer alignment work, and probably also inner alignment (to the extent that it’s a problem).