KvmanThinking comments on Have we seen any “ReLU instead of sigmoid-type improvements” recently

KvmanThinking 23 Nov 2024 14:15 UTC
1 point
0
How were these discovered? Slow, deliberate thinking, or someone trying some random thing to see what it does and suddenly the AI is a zillion times smarter?
- Marcus Williams 23 Nov 2024 15:58 UTC
  2 points
  0
  Parent
  “We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence.” -SwiGLU paper.
  
  I think it varies, a few of these are trying “random” things, but mostly they are educated guesses which are then validated empirically. Often there is a spefic problem we want to solve i.e. exploding gradients or O(n^2) attention and then authors try things which may or may not solve/mitigate the problem.