Daniel Kokotajlo comments on Applying refusal-vector ablation to a Llama 3 70B agent

Daniel Kokotajlo 12 May 2024 15:47 UTC
7 points
4
In general, Llama 3 70B is a competent agent with appropriate scaffolding, and Llama 3 8B also has decent performance.

I’m curious about, and skeptical of, this claim. If you set it up in an Auto-GPT-esque scaffold with connections to the internet and ability to edit docs and make forum comments and emails and so forth, and set it loose with some long-term goal like “accumulate money” or “befriend people” or whatever… does it actually chug along for hours and hours moving vaguely in the right direction, or does it e.g. get stuck pretty quickly or go into some sort of confused doom spiral?
- Simon Lermen 12 May 2024 19:15 UTC
  6 points
  0
  Parent
  “does it actually chug along for hours and hours moving vaguely in the right direction”
  I am pretty sure no. It is competent within the scope of tasks I present here. But this is a good point, I am probably overstating things here. I might edit this.
  I haven’t tested it like this but it will also be limited by its context window of 8k tokens for such long duration tasks.
  Edit: I have now edited this