Charlie Steiner comments on All AGI Safety questions welcome (especially basic ones) [May 2023]

Charlie Steiner 10 May 2023 5:22 UTC
2 points
0

Why would hardcoded model-based RL probably self-modify or build successors this way, though?

Because picking a successor is like picking a policy, and risk aversion over policies can give different results than risk aversion over actions.

Like, suppose you go to a casino with $100, and there are two buttons you can push—one button does nothing, and the other button you have a 60% chance to win a dollar and 40% chance to lose a dollar. If you’re risk averse you might choose to only ever press the first button (not gamble).

If there’s some action you could take to enact a policy of pressing the second button 100 times, that’s like a third button, which gives about $20 with standard deviation $5. Maybe you’d prefer that button to doing nothing even if you’re risk averse.
- MichaelStJules 10 May 2023 8:51 UTC
  1 point
  0
  Parent
  Because picking a successor is like picking a policy, and risk aversion over policies can give different results than risk aversion over actions.
  I was already thinking the AI would be risk averse over whole policies and the aggregate value of their future, not locally/greedily/separately for individual actions and individual unaggregated rewards.
  - Charlie Steiner 10 May 2023 13:56 UTC
    4 points
    0
    Parent
    I’m confused about how to do that because I tend to think of self-modification as happening when the agent is limited and can’t foresee all the consequences of a policy, especially policies that involve making itself smarter. But I suspect that even if you figure out a non-confusing way to talk about risk aversion for limited agents that doesn’t look like actions on some level, you’ll get weird behavior under self-modification, like an update rule that privileges the probability distribution you had at the time you decided to self-modify.