The original sparse coding paper[1] in 1997 - major early advance in learned features for vision and also neuroscience; significant downstream influence on later DL.
Also I can see from the google scholar page for Juergen Schmidhuber that you are missing some of his lab’s papers that fit your criteria—such as “Gradient flow in recurrent nets”. If he were here he would hate that. Schmidhuber claims that much of the key ideas in DL were discovered at his lab in 1990-1991. Even if that seems like a stretch, I do think they early explored a wide range of foundational ideas that only became more important over time: vanishing gradients, distillation and compression, memory/attention, metalearning, artificial curiosity, and more.
Olshausen, Bruno A., and David J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?.” Vision research 37.23 (1997): 3311-3325.
The original sparse coding paper[1] in 1997 - major early advance in learned features for vision and also neuroscience; significant downstream influence on later DL.
Also I can see from the google scholar page for Juergen Schmidhuber that you are missing some of his lab’s papers that fit your criteria—such as “Gradient flow in recurrent nets”. If he were here he would hate that. Schmidhuber claims that much of the key ideas in DL were discovered at his lab in 1990-1991. Even if that seems like a stretch, I do think they early explored a wide range of foundational ideas that only became more important over time: vanishing gradients, distillation and compression, memory/attention, metalearning, artificial curiosity, and more.
Olshausen, Bruno A., and David J. Field. “Sparse coding with an overcomplete basis set: A strategy employed by V1?.” Vision research 37.23 (1997): 3311-3325.