I feel a lot of the problem relates to an Extremal Goodhart effect, where in the popular imagination views simulations as not equivalent to reality.
That seems right, but aren’t all those heuristics prone to Goodharting? If your prior distribution is extremely sharp and you barely update from it, it seems likely that you run into all those various failure modes.
However my guess is that simplicity, not speed or stability priors are the default.
Not sure what you mean by default here. Likely to be used, effective, or?
I actually should focus on the circuit complexity prior, but my view is that due to the smallness of agents compared to reality, that they must generalize very well to new environments, which pushes in a simplicity direction.
That seems right, but aren’t all those heuristics prone to Goodharting? If your prior distribution is extremely sharp and you barely update from it, it seems likely that you run into all those various failure modes.
Not sure what you mean by default here. Likely to be used, effective, or?
I actually should focus on the circuit complexity prior, but my view is that due to the smallness of agents compared to reality, that they must generalize very well to new environments, which pushes in a simplicity direction.