Adam Scherlis answers Three questions about mesa-optimizers

Adam Scherlis 12 Apr 2022 5:22 UTC
5 points
Fragment of an answer to 2: it seems weird to imagine a feedforward network implementing a for-loop layer by layer, if the weights aren’t tied. I guess it might involve layers with flexible behavior, with the activations carrying instructions for each layer. It’s definitely more natural for a recurrent network, since each layer needs to be able to translate the instructions into the same computational behavior. On the other hand (peeking at other answers) language models seem to be a natural fit for doing for-loop-like behavior across sequence position.

In general, feedforward networks are circuits, not programs, and I don’t have good intuition about what computationally nontrivial circuits are like.
- abramdemski 13 Apr 2022 4:20 UTC
  4 points
  Parent
  Yeah, it seems more natural to think of a recurrent network, or even better, a memory network such as a Neural Turing Machine. The question becomes similar to “what does a loop look like at the hardware level in a CPU?”.