EleutherAI’s #alignment channels are good to ask questions in. Some specific answers
I understand that a reward maximiser would wire-head (take control over the reward provision mechanism), but I don’t see why training an RL agent would necessarily end up in a reward-maximising agent? Turntrout’s Reward is Not the Optimisation Target shed some clarity on this, but I definitely have remaining questions.
I agree with this suggestion. EleutherAI’s alignment channels have been invaluable for my understanding of the alignment problem. I typically get insightful responses and explanations on the same day as posting. I’ve also been able to answer other folks’ questions to deepen my inside view.
There is a alignment-beginners channel and a alignment-general channel. Your questions seem similar to what I see in alignment-general . For example, I received helpful answers when I asked this question about inverse reinforcement learning there yesterday.
Question: When I read Human Compatible a while back, I had the takeaway that Stuart Russel was very bullish on Inverse Reinforcement Learning being an important alignment research direction. However, I don’t see much mention of IRL on EleutherAI and the alignment forum. I see much more content about RLHF. Is IRL and RLHF the same thing? If not, what are folks’ thoughts on IRL?
EleutherAI’s
#alignment
channels are good to ask questions in. Some specific answersLeo Gao’s Toward Deconfusing Wireheading and Reward Maximization sheds some light on this.
I agree with this suggestion. EleutherAI’s alignment channels have been invaluable for my understanding of the alignment problem. I typically get insightful responses and explanations on the same day as posting. I’ve also been able to answer other folks’ questions to deepen my inside view.
There is a
alignment-beginners
channel and aalignment-general
channel. Your questions seem similar to what I see inalignment-general
. For example, I received helpful answers when I asked this question about inverse reinforcement learning there yesterday.