DanielFilan comments on DanielFilan’s Shortform Feed

DanielFilan 24 Aug 2023 19:52 UTC
2 points
Research project idea: formalize a set-up with two reinforcement learners, each training the other. I think this is what’s going on in baby care. Specifically: a baby is learning in part by reinforcement learning: they have various rewards they like getting (food, comfort, control over environment, being around people). Some of those rewards are dispensed by you: food, and whether you’re around them, smiling and/or mimicking them. Also, you are learning via RL: you want the baby to be happy, nourished, rested, and not cry (among other things). And the baby is dispensing some of those rewards.

Questions:
- What even happens? (I think in many setups you won’t get mutual wireheading)
- Do you get a nice equilibrium?
- Is there some good alignment property you can get?
  - Maybe a terrible alignment property?
This could also be a model of humans and advanced algorithmic timelines or some such thing.
- mako yass 26 Aug 2023 6:27 UTC
  2 points
  Parent
  This will always multiply error, every time, until you have a society, at which point the agents aren’t really doing naked RL any more because they need to be resilient enough to not get parasitized/dutchbooked.