Does this mean that the method needs to work for ~arbitrary architectures, and that the solution must use substantially the same architecture as the original?
Yes, approximately. If you can do it for only e.g. transformers, but not other things, that would be interesting.
Does this mean that it must be able to deal with a broad variety of questions, so that we cannot simply sit down and think about how to optimize the model for getting a single question (e.g. “Where is the diamond?”) right?
Yes, approximately. Thinking about how to get one question right might be a productive way to do research. However, if you have a strategy for answering 1 question right, it should also work for other questions.
Yes, approximately. If you can do it for only e.g. transformers, but not other things, that would be interesting.
I guess a closer analogy would be “What if the family of strategies only works for transformer-based GANs?” than “What if the family of strategies only works for transformers?”. As in there’d be heavy restrictions on both the “layer types”, the I/O, and the training procedure?
Yes, approximately. Thinking about how to get one question right might be a productive way to do research. However, if you have a strategy for answering 1 question right, it should also work for other questions.
What if each question/family of questions you want to answer requires careful work on the structure of the model? So the strategy does generalize, but it doesn’t generalize “for free”?
Yes, approximately. If you can do it for only e.g. transformers, but not other things, that would be interesting.
Yes, approximately. Thinking about how to get one question right might be a productive way to do research. However, if you have a strategy for answering 1 question right, it should also work for other questions.
I guess a closer analogy would be “What if the family of strategies only works for transformer-based GANs?” than “What if the family of strategies only works for transformers?”. As in there’d be heavy restrictions on both the “layer types”, the I/O, and the training procedure?
What if each question/family of questions you want to answer requires careful work on the structure of the model? So the strategy does generalize, but it doesn’t generalize “for free”?