Motivated by our findings that attention layers are attempting to implicitly optimize internal objective functions, we introduce the mesa-layer, a novel attention layer that efficiently solves a least-squares optimization problem, instead of taking just a single gradient step towards an optimum. We show that a single mesa-layer outperforms deep linear and softmax self-attention Transformers on simple sequential tasks while offering more interpretability
It looks like you can analyze transformers, discover the internal patterns that emergently are formed, analyze which ones work the best, and then redesign your network architecture to start with an extra layer that has this pattern already present.
Not only is this closer to the human brain, but yes, it’s adding a type of internal mesa optimizer. Doing it deliberately instead of letting one form emergently from the data probably prevents the failure mode AI doomers are worried about, this layer allowing the machine to defect against humans.
Didn’t they demonstrate that transformers could be mesaoptimizers? (I never properly understood the paper, so it’s a genuine question.) Uncovering Mesaoptimization Algorithms in Transformers
From the paper:
It looks like you can analyze transformers, discover the internal patterns that emergently are formed, analyze which ones work the best, and then redesign your network architecture to start with an extra layer that has this pattern already present.
Not only is this closer to the human brain, but yes, it’s adding a type of internal mesa optimizer. Doing it deliberately instead of letting one form emergently from the data probably prevents the failure mode AI doomers are worried about, this layer allowing the machine to defect against humans.