Rohin Shah comments on Clarifying AI X-risk

Rohin Shah 2 Nov 2022 15:51 UTC
LW: 6 AF: 5
0
AF
Oh, I see. I’m not interested in “solving outer alignment” if that means “creating a real-world physical process that outputs numbers that reward good things and punish bad things in all possible situations” (because as you point out it seems far too stringent a requirement).
Then I was wondering if they use a more refined notion of what outer alignment means, possibly by taking into account the physical capabilities of the agent, and I was trying to ask if something like that has already been written down anywhere.
You could look at ascription universality and ELK. The general mindset is roughly “ensure your reward signal captures everything that the agent knows”; I think the mindset is well captured in mundane solutions to exotic problems.
- Leon Lang 2 Nov 2022 18:05 UTC
  LW: 1 AF: 1
  0
  AF Parent
  Thanks a lot for these pointers!