Thanks for the summary.
Does machievelli work for chatbots like LIMA?
If not, which do you think is the sota? Anthropic’s?
Yep, it’s a language model agent benchmark. It just feeds a scenario and some actions to an autoregressive LM, and asks the model to select an action.
chatbots don’t map scenarios to actions, they map queries to replies.
Thanks for the summary.
Does machievelli work for chatbots like LIMA?
If not, which do you think is the sota? Anthropic’s?
Yep, it’s a language model agent benchmark. It just feeds a scenario and some actions to an autoregressive LM, and asks the model to select an action.
chatbots don’t map scenarios to actions, they map queries to replies.