I would argue that every LLM since GPT-3 has been a mesa-optimizer, since they all do search/optimization/learning as described in Language Models are Few-Shot Learners.
interesting! do you mean experiments in Sec 3.9.2?
I would argue that every LLM since GPT-3 has been a mesa-optimizer, since they all do search/optimization/learning as described in Language Models are Few-Shot Learners.
interesting! do you mean experiments in Sec 3.9.2?