(sorry I didn’t reply to this when you messaged it to me privately, this has been a low-brain-power week)
To reason about whether machine learning will result in these mechanistic optimizers, we need to reason about the inductive biases of machine learning algorithms. We mostly don’t yet know how likely they are.
I think Evan also indirectly appeals to ‘inductive biases’ in the parameter to function mapping of neural networks, e.g. the result Joar Skalse contributed to about properties of random nets.
A myopic objective requires an extra distinction to say “don’t continue past the end of the episode”
Something about online learning
The online learning argument is actually super complicated and dependent on a bunch of factors, so I’m not going to summarize that one here. So I’ve just added the other one:
Even if training on a myopic base objective, we might expect the mesa objective to be non-myopic, as the non-myopic objective “pursue X” is simpler than the myopic objective “pursue X until time T”.
(sorry I didn’t reply to this when you messaged it to me privately, this has been a low-brain-power week)
I think Evan also indirectly appeals to ‘inductive biases’ in the parameter to function mapping of neural networks, e.g. the result Joar Skalse contributed to about properties of random nets.
Also my biggest take-away was the argument for why we shouldn’t expect myopia by default. But perhaps this was already obvious to others.
So my understanding is there are two arguments:
A myopic objective requires an extra distinction to say “don’t continue past the end of the episode”
Something about online learning
The online learning argument is actually super complicated and dependent on a bunch of factors, so I’m not going to summarize that one here. So I’ve just added the other one:
I did mean to include that; going to delete the word “algorithms” since that’s what’s causing the ambiguity.