“why didn’t the first person to come up with the idea of using computers to predict the next element in a sequence patent that idea, in full generality”
Patents are valid for about 20 years. But Bengio et al used NNs to predict the next word back in 2000:
Patents are valid for about 20 years. But Bengio et al used NNs to predict the next word back in 2000:
https://papers.nips.cc/paper_files/paper/2000/file/728f206c2a01bf572b5940d7d9a8fa4c-Paper.pdf
So this idea is old. Only some specific architectural aspects are new.