Awesome, thank you for the thoughtful comment! The links are super interesting, reminds me of some of the research in empirical aesthetics I read forever ago.
On the topic of circular preferences: It turns out that the type of reward model I am training here handles non-transitive preferences in a “sensible” fashion. In particular, if you’re “non-circular on average” (i.e. you only make accidental “mistakes” in your rating) then the model averages that out. And if you consitently have a loopy utility function, then the reward model will map all the elements of the loop onto the same reward value.
Finally: Yes, totally, feel free to send me the guest ID either here of via DM!
Interesting! I’m fascinated by the idea of a way to figure out the transitive relations via a “non-circular on average” assumption and might go hunt down the code to see how it works. I think humans (and likely dogs and maybe pigeons) have preference learning stuff that helps them remember and abstract early choices and early outcomes somehow, to bootstrap into skilled choosers pretty fast, but I’ve never really thought about the algorithms that might do this. It feels like stumbling across a whole potential microfield of cognitive science that I’ve never heard of before that is potentially important to friendliness research!
Hi Jennifer!
Awesome, thank you for the thoughtful comment! The links are super interesting, reminds me of some of the research in empirical aesthetics I read forever ago.
On the topic of circular preferences: It turns out that the type of reward model I am training here handles non-transitive preferences in a “sensible” fashion. In particular, if you’re “non-circular on average” (i.e. you only make accidental “mistakes” in your rating) then the model averages that out. And if you consitently have a loopy utility function, then the reward model will map all the elements of the loop onto the same reward value.
Finally: Yes, totally, feel free to send me the guest ID either here of via DM!
Interesting! I’m fascinated by the idea of a way to figure out the transitive relations via a “non-circular on average” assumption and might go hunt down the code to see how it works. I think humans (and likely dogs and maybe pigeons) have preference learning stuff that helps them remember and abstract early choices and early outcomes somehow, to bootstrap into skilled choosers pretty fast, but I’ve never really thought about the algorithms that might do this. It feels like stumbling across a whole potential microfield of cognitive science that I’ve never heard of before that is potentially important to friendliness research!
(I have sent the DM. Thanks <3)