I don’t understand what you mean by “previous reward functions”.
I can’t tell if you’re being uncharitable or if there’s a way bigger inferential gap than I think, but I do literally just mean… reward functions used previously. Like, people did reinforcement learning before RLHF. They used reward functions for StarCraft and for Go and for Atari and for all sorts of random other things. In more complex environments, they used curiosity and empowerment reward functions. And none of these are the type of reward function that would withstand much optimization pressure (except insofar as they only applied to domains simple enough that it’s hard to actually achieve “bad outcomes”).
But I mean, people have used handcrafted rewards since forever. The human-feedback part of RLHF is nothing new. It’s as old as all the handcrafted reward functions you mentioned (as evidenced by Eliezer referencing a reward button in this 10 year old comment, and even back then the idea of just like a human-feedback driven reward was nothing new), so I don’t understand what you mean by “previous”.
If you say “other” I would understand this, since there are definitely many different ways to structure reward functions, but I do feel kind of aggressively gaslit by a bunch of people who keep trying to frame RLHF as some kind of novel advance when it’s literally just the most straightforward application of reinforcement learning that I can imagine (like, I think it really is more obvious and was explored earlier than basically any other way I can think off of training an AI system, since it is the standard way we do animal training).
The term “reinforcement learning” literally has its origin in animal training, where approximately all we do is whatever you would call modern RLHF (having a reward button, or a punishment button, usually in the form of food or via previous operant conditioning). It’s literally the oldest idea in the book of reinforcement learning. There are no “previous” reward functions. It’s literally like, one of the very first class of reward functions we considered.
I can’t tell if you’re being uncharitable or if there’s a way bigger inferential gap than I think, but I do literally just mean… reward functions used previously. Like, people did reinforcement learning before RLHF. They used reward functions for StarCraft and for Go and for Atari and for all sorts of random other things. In more complex environments, they used curiosity and empowerment reward functions. And none of these are the type of reward function that would withstand much optimization pressure (except insofar as they only applied to domains simple enough that it’s hard to actually achieve “bad outcomes”).
But I mean, people have used handcrafted rewards since forever. The human-feedback part of RLHF is nothing new. It’s as old as all the handcrafted reward functions you mentioned (as evidenced by Eliezer referencing a reward button in this 10 year old comment, and even back then the idea of just like a human-feedback driven reward was nothing new), so I don’t understand what you mean by “previous”.
If you say “other” I would understand this, since there are definitely many different ways to structure reward functions, but I do feel kind of aggressively gaslit by a bunch of people who keep trying to frame RLHF as some kind of novel advance when it’s literally just the most straightforward application of reinforcement learning that I can imagine (like, I think it really is more obvious and was explored earlier than basically any other way I can think off of training an AI system, since it is the standard way we do animal training).
The term “reinforcement learning” literally has its origin in animal training, where approximately all we do is whatever you would call modern RLHF (having a reward button, or a punishment button, usually in the form of food or via previous operant conditioning). It’s literally the oldest idea in the book of reinforcement learning. There are no “previous” reward functions. It’s literally like, one of the very first class of reward functions we considered.