Peter Hroššo comments on How I think about alignment

Peter Hroššo 13 Aug 2022 14:40 UTC
LW: 4 AF: 2
1
AF

I expect there to be too much happenstance encoded in my values.

I believe this is a bug, not a feature that we would like to reproduce.

I think that the direction you described with the AI analysing how you acquired your values is important, because it shouldn’t be mimicking just your current values. It should be able to adapt the values to new situations the way you’d do (distributional shift). Think all the books / movies where people get to unusual situations and have to make tough moral calls. Like plane crashing in the middle of nowhere with 20 survivors who are gradually running out of food.. Superhuman AI will be running into unknown situations all the time because of different capabilities.

Human values are undefined for most situations a superhuman AI will encounter.
- Vladimir_Nesov 13 Aug 2022 16:10 UTC
  3 points
  0
  Parent
  
  Superhuman AI will be running into unknown situations all the time because of different capabilities.
  
  It might have some sort of perverse incentive to get there, but unless it has already extrapolated the values enough to make them robust to those situations, it really shouldn’t. It’s not clear how specifically it shouldn’t, what is a principled way of making such decisions.
  - Noosphere89 13 Aug 2022 16:29 UTC
    2 points
    1
    Parent
    I’d argue that’s how humanity has to deal with value extrapolation ever since the start of the industrial revolution. It happens every time you gain new technological capabilities. So there isn’t a perverse incentive here, it’s just how technological advancement works.
    
    Now from an EA/LW perspective this seems like unprincipled extrapolation of values, and it is, but what else do we have, really?
    - Vladimir_Nesov 13 Aug 2022 16:39 UTC
      2 points
      0
      Parent
      
      what else do we have, really?
      
      Patience. And cosmic wealth, some of which can be used to ponder these questions for however many tredecillions of years it takes.