Finetuning LLMs with RL seems to make them more agentic. We will look at the changes RL makes to LLMs’ weights; we can see how localized the changes are, get information about what sorts of computations make something agentic, and make conjectures about selected systems, giving us a better understanding of agency.
Could you elaborate on how you measure the “agenticness” of a model in this experiment? In case you don’t want to talk about it until you finish the project that’s also fine, just thought I’d ask.
Could you elaborate on how you measure the “agenticness” of a model in this experiment? In case you don’t want to talk about it until you finish the project that’s also fine, just thought I’d ask.