The counterfactual depends on what other research people would have done and how successful it would have been. I don’t think you can observe it “by simply looking.”
That said, I’m not quite sure what counterfactual you are imagining. By the time transformers were developed, soft attention in combination with LSTMs was already popular. I assume that in your counterfactual soft attention didn’t ever catch on? Was it proposed in 2014 but languished in obscurity and no one picked it up? Or was sequence-to-sequence attention widely used, but no one ever considered self-attention? Or something else?
Depending on how you are defining the counterfactual, I may think that you are right about the consequences. But if you are talking about a counterfactual that I regard as implausible, then naturally it’s not as interesting to me as things that actually happen. That’s what I was looking for in the quoted part of the OP—and evaluating transformers in terms of their (large!) actual impact rather than an imagined hypothetical where they could lead to fast-takeoff-like consequences.
The counterfactual depends on what other research people would have done and how successful it would have been. I don’t think you can observe it “by simply looking.”
That said, I’m not quite sure what counterfactual you are imagining. By the time transformers were developed, soft attention in combination with LSTMs was already popular. I assume that in your counterfactual soft attention didn’t ever catch on? Was it proposed in 2014 but languished in obscurity and no one picked it up? Or was sequence-to-sequence attention widely used, but no one ever considered self-attention? Or something else?
Depending on how you are defining the counterfactual, I may think that you are right about the consequences. But if you are talking about a counterfactual that I regard as implausible, then naturally it’s not as interesting to me as things that actually happen. That’s what I was looking for in the quoted part of the OP—and evaluating transformers in terms of their (large!) actual impact rather than an imagined hypothetical where they could lead to fast-takeoff-like consequences.