I’d like to see them using the model to generate the problem framing which produces the highest score on a given task.
Even if it’s just the natural language description of addition that comes before the addition task, it’d be interesting how it thinks addition should be explained. Does some latent space of sentences one could use for this fall out of the model for free?
More generally, a framing is a function turning data like [(2,5,7), (1,4,5), (1,2,_)] into text like “Add. 2+5=7, 1+4=5, 1+2=”, and what we want is a latent space over framings.
More generally, I expect that getting the full power of the model requires algorithms that apply the model multiple times. For example, what happens if you run the grammar correction task multiple times on the same text? Will it fix errors it missed the first time on the second try? If so, the real definition of framing should allow multiple applications like this. It would look like a neural net whose neurons manipulate text data instead of number data. Since it doesn’t use weights, we can’t train it, and instead we have to use a latent space over possible nets.
I’d like to see them using the model to generate the problem framing which produces the highest score on a given task.
Even if it’s just the natural language description of addition that comes before the addition task, it’d be interesting how it thinks addition should be explained. Does some latent space of sentences one could use for this fall out of the model for free?
More generally, a framing is a function turning data like [(2,5,7), (1,4,5), (1,2,_)] into text like “Add. 2+5=7, 1+4=5, 1+2=”, and what we want is a latent space over framings.
More generally, I expect that getting the full power of the model requires algorithms that apply the model multiple times. For example, what happens if you run the grammar correction task multiple times on the same text? Will it fix errors it missed the first time on the second try? If so, the real definition of framing should allow multiple applications like this. It would look like a neural net whose neurons manipulate text data instead of number data. Since it doesn’t use weights, we can’t train it, and instead we have to use a latent space over possible nets.