If it works out-of-distribution, that’s a huge deal for alignment! Especially if alignment generalizes farther than capabilities. Then you can just throw something like imitative amplification at it and it is probably aligned (assuming that “does well out-of-distribution” implies that the mesa-optimizers are tamed).
I have low confidence in that, but I guess it (OOD generalization by “liquid” networks) works well in differentiable continuous domains (like low-level motion planning) by exploiting natural smoothness of a system. So I wouldn’t get my hopes high in its universal applicability.
it’s built out of an optimizer, why would that tame inner optimizers? perhaps it makes them explicit, because now the whole thing is a loss function, but the iterative inference can’t be shut off and still get functionally
That’s just part of the definition of “works out of distribution”. Scenarios where inner optimizers become AGI or something are out of distribution from training.
If it works out-of-distribution, that’s a huge deal for alignment! Especially if alignment generalizes farther than capabilities. Then you can just throw something like imitative amplification at it and it is probably aligned (assuming that “does well out-of-distribution” implies that the mesa-optimizers are tamed).
I have low confidence in that, but I guess it (OOD generalization by “liquid” networks) works well in differentiable continuous domains (like low-level motion planning) by exploiting natural smoothness of a system. So I wouldn’t get my hopes high in its universal applicability.
it’s built out of an optimizer, why would that tame inner optimizers? perhaps it makes them explicit, because now the whole thing is a loss function, but the iterative inference can’t be shut off and still get functionally
That’s just part of the definition of “works out of distribution”. Scenarios where inner optimizers become AGI or something are out of distribution from training.