On second thought, that’s not a big deal: we can fix it by interspersing random bits in the input. This way, the transformer would see a history that includes yand the random bits used to produce it (which encode z). More generally, such a setup can simulate any randomized RNN.
On second thought, that’s not a big deal: we can fix it by interspersing random bits in the input. This way, the transformer would see a history that includes y and the random bits used to produce it (which encode z). More generally, such a setup can simulate any randomized RNN.