Suppose that one wants to consistently make accurate predictions about a computationally irreducible agent-environment system. In general, the most efficient way to do so is to run the agent in the given environment. There are probably no shortcuts, even via mechanistic models.
For dangerous AI agents, an accurate simulation box of the deployment environment would be ideal for safe empiricism. This is probably intractable for many use cases of AI agents, but computational irreducibility implies that methods other than empiricism are probably even more intractable.
Thank you so much for the excellent and insightful post on mechanistic models, Evan!
My hypothesis is that the difficulty of finding mechanistic models that consistently make accurate predictions is likely due to the agent-environment system’s complexity and computational irreducibility. Such agent-environment interactions may be inherently unpredictable “because of the difficulty of pre-stating the relevant features of ecological niches, the complexity of ecological systems and [the fact that the agent-ecology interaction] can enable its own novel system states.”
Suppose that one wants to consistently make accurate predictions about a computationally irreducible agent-environment system. In general, the most efficient way to do so is to run the agent in the given environment. There are probably no shortcuts, even via mechanistic models.
For dangerous AI agents, an accurate simulation box of the deployment environment would be ideal for safe empiricism. This is probably intractable for many use cases of AI agents, but computational irreducibility implies that methods other than empiricism are probably even more intractable.
Please read my post “The limited upside of interpretability” for a detailed argument. It would be great to hear your thoughts!