Here on LW / AF, “mesa optimization” seems to only apply if there’s some sort of “general” learning algorithm, especially one that is “using search”, for reasons that have always been unclear to me. Some relevant posts taking the opposite perspective (which I endorse):
Here on LW / AF, “mesa optimization” seems to only apply if there’s some sort of “general” learning algorithm, especially one that is “using search”, for reasons that have always been unclear to me. Some relevant posts taking the opposite perspective (which I endorse):
Is the term mesa optimizer too narrow?
Why is pseudo-alignment “worse” than other ways ML can fail to generalize?