Here are some of the questions that make me think that you might get a wide variety of simulations even just with base models:
Do you think that non-human-entities can be useful for predicting what humans will say next? For example, imaginary entities from fictions humans share? Or artificial entities they interact with?
Do you think that the concepts a base model learns from predicting human text could recombine in ways that simulate entities that didn’t appear in the training data? A kind of generalization?
But I guess I broadly agree with you that base-models are likely to have primarily human-level-simulations (there are of course many of these!). But that’s not super reassuring, because we’re not just interested in base models, but mostly in agents that are driven by LLMs one way or another.
Yes, that’s very reasonable. My initial goal when first thinking about how to explore CCS was to use CCS with RL-tuned models and to study the development of coherent beliefs as the system is ‘agentised’. We didn’t get that far because we ran into problems with CCS first.
To be frank, the reasons for using Chinchilla are boring and a mixture of technical/organisational. If we did the project again now, we would use Gemini models with ‘equivalent’ finetuned and base models, but given our results so far we didn’t think the effort of setting all that up and analysing it properly was worth the opportunity cost. We did a quick sense-check with an instruction-tuned T5 model that things didn’t completely fall apart, but I agree that the lack of ‘agentised’ models is a significant limitation of our experimental results. I don’t think it changes the conceptual points very much though—I expect to see things like simulated-knowledge in agentised LLMs too.