If I understand the problem statement correctly, I think I could take a stab at easier versions of the problem, but that the current formulation is too much to swallow in one bite. In particular I am concerned about the following parts:
Setting
We start with an unaligned benchmark:
* An architecture Mθ
<snip>
Goal
To solve ELK in this case we must:
* Supply a modified architecture Mθ+ which has the same inputs and outputs as Mθ <snip>
Does this mean that the method needs to work for ~arbitrary architectures, and that the solution must use substantially the same architecture as the original?
except that after producing all other outputs it can answer a question Q in natural language
Does this mean that it must be able to deal with a broad variety of questions, so that we cannot simply sit down and think about how to optimize the model for getting a single question (e.g. “Where is the diamond?”) right?
According to my current model of how these sorts of things work, such constraints makes the problem fundamentally unsolvable, so I am not even going to attempt it, while loosening the constraints may make it solvable, and so I might attempt it.
Does this mean that the method needs to work for ~arbitrary architectures, and that the solution must use substantially the same architecture as the original?
Yes, approximately. If you can do it for only e.g. transformers, but not other things, that would be interesting.
Does this mean that it must be able to deal with a broad variety of questions, so that we cannot simply sit down and think about how to optimize the model for getting a single question (e.g. “Where is the diamond?”) right?
Yes, approximately. Thinking about how to get one question right might be a productive way to do research. However, if you have a strategy for answering 1 question right, it should also work for other questions.
Yes, approximately. If you can do it for only e.g. transformers, but not other things, that would be interesting.
I guess a closer analogy would be “What if the family of strategies only works for transformer-based GANs?” than “What if the family of strategies only works for transformers?”. As in there’d be heavy restrictions on both the “layer types”, the I/O, and the training procedure?
Yes, approximately. Thinking about how to get one question right might be a productive way to do research. However, if you have a strategy for answering 1 question right, it should also work for other questions.
What if each question/family of questions you want to answer requires careful work on the structure of the model? So the strategy does generalize, but it doesn’t generalize “for free”?
If I understand the problem statement correctly, I think I could take a stab at easier versions of the problem, but that the current formulation is too much to swallow in one bite. In particular I am concerned about the following parts:
Does this mean that the method needs to work for ~arbitrary architectures, and that the solution must use substantially the same architecture as the original?
Does this mean that it must be able to deal with a broad variety of questions, so that we cannot simply sit down and think about how to optimize the model for getting a single question (e.g. “Where is the diamond?”) right?
According to my current model of how these sorts of things work, such constraints makes the problem fundamentally unsolvable, so I am not even going to attempt it, while loosening the constraints may make it solvable, and so I might attempt it.
Yes, approximately. If you can do it for only e.g. transformers, but not other things, that would be interesting.
Yes, approximately. Thinking about how to get one question right might be a productive way to do research. However, if you have a strategy for answering 1 question right, it should also work for other questions.
I guess a closer analogy would be “What if the family of strategies only works for transformer-based GANs?” than “What if the family of strategies only works for transformers?”. As in there’d be heavy restrictions on both the “layer types”, the I/O, and the training procedure?
What if each question/family of questions you want to answer requires careful work on the structure of the model? So the strategy does generalize, but it doesn’t generalize “for free”?