Yep, it’s a language model agent benchmark. It just feeds a scenario and some actions to an autoregressive LM, and asks the model to select an action.
chatbots don’t map scenarios to actions, they map queries to replies.
Yep, it’s a language model agent benchmark. It just feeds a scenario and some actions to an autoregressive LM, and asks the model to select an action.
chatbots don’t map scenarios to actions, they map queries to replies.