Ulisse Mini comments on Looking for an alignment tutor

Ulisse Mini 18 Dec 2022 15:39 UTC
2 points
0
EleutherAI’s #alignment channels are good to ask questions in. Some specific answers

I understand that a reward maximiser would wire-head (take control over the reward provision mechanism), but I don’t see why training an RL agent would necessarily end up in a reward-maximising agent? Turntrout’s Reward is Not the Optimisation Target shed some clarity on this, but I definitely have remaining questions.

Leo Gao’s Toward Deconfusing Wireheading and Reward Maximization sheds some light on this.
- Kyle O’Brien 18 Dec 2022 23:25 UTC
  2 points
  0
  Parent
  I agree with this suggestion. EleutherAI’s alignment channels have been invaluable for my understanding of the alignment problem. I typically get insightful responses and explanations on the same day as posting. I’ve also been able to answer other folks’ questions to deepen my inside view.
  There is a alignment-beginners channel and a alignment-general channel. Your questions seem similar to what I see in alignment-general . For example, I received helpful answers when I asked this question about inverse reinforcement learning there yesterday.
```
Question: When I read Human Compatible a while back, I had the takeaway that Stuart Russel was very bullish on Inverse Reinforcement Learning being an important alignment research direction. However, I don’t see much mention of IRL on EleutherAI and the alignment forum. I see much more content about RLHF. Is IRL and RLHF the same thing? If not, what are folks’ thoughts on IRL?
```