Apart from this, I do think logical dependences and superrationality would be broken if there is a strict hierarchy between different versions of models, where models know their place in the hierarchy.
Oh interesting. I think this still runs into the issue that you’ll have instrumental goals whenever you ask the model to simulate itself (i.e. just the first step in the hierarchy hits this issue).
Regarding using prompts, I wonder, how do you think we could get the kind of model you talk about in your post on conditioning generative models?
I was imagining that we train the model to predict e.g. tomorrow’s newspaper given today’s. The fact that it’s not just a stream of text but comes with time-stamps (e.g. this was written X hours later) feels important for making it simulate actual histories.
Oh interesting. I think this still runs into the issue that you’ll have instrumental goals whenever you ask the model to simulate itself (i.e. just the first step in the hierarchy hits this issue).
I was imagining that we train the model to predict e.g. tomorrow’s newspaper given today’s. The fact that it’s not just a stream of text but comes with time-stamps (e.g. this was written X hours later) feels important for making it simulate actual histories.