Reward functions often are structured as objectives
What does this mean? By “structured as objectives”, do you mean something like “people try to express what they want with a reward function, by conferring more reward to more desirable states”? (I’m going to assume so for the rest of the comment, LMK if this is wrong.)
I agree that other people (especially my past self) think about reward functions this way. I think they’re generally wrong to do so, and it’s misleading as to the real nature of the alignment problem.
I agree that this is not always the case, though, as in the discussion here.
I agree with that post, thanks for linking.
if you had access to e.g. AIXI, you could directly build a “reward maximizer.”
I agree that this is not always the case, though, as in the discussion here. That being said, I think it is often enough the case that it made sense to focus on that particular case in RFLO.
As far as I can tell, AIXI and other hardcoded planning agents are the known exceptions to the arguments in this post. We will not get AGI via these approaches. When else is it the case? I therefore still feel confused why you think it made sense.
While I definitely appreciate the work you all did with RFLO, the framing of reward as a “base objective” seems like a misstep that set discourse in a weird direction which I’m trying to push back on (from my POV!). I think that the “base objective” is better described as a “cognitive-update-generator.” (This is not me trying to educate you on this specific point, but rather argue that it really matters how we frame the problem in our day-to-day reasoning.)
What does this mean? By “structured as objectives”, do you mean something like “people try to express what they want with a reward function, by conferring more reward to more desirable states”? (I’m going to assume so for the rest of the comment, LMK if this is wrong.)
I agree that other people (especially my past self) think about reward functions this way. I think they’re generally wrong to do so, and it’s misleading as to the real nature of the alignment problem.
I agree with that post, thanks for linking.
As far as I can tell, AIXI and other hardcoded planning agents are the known exceptions to the arguments in this post. We will not get AGI via these approaches. When else is it the case? I therefore still feel confused why you think it made sense.
While I definitely appreciate the work you all did with RFLO, the framing of reward as a “base objective” seems like a misstep that set discourse in a weird direction which I’m trying to push back on (from my POV!). I think that the “base objective” is better described as a “cognitive-update-generator.” (This is not me trying to educate you on this specific point, but rather argue that it really matters how we frame the problem in our day-to-day reasoning.)