Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Erik Garrison comments on
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Erik Garrison
8 Sep 2024 4:50 UTC
3
points
0
Could this affect distributed training that might make the assumption of rotational invariance?
Back to top
Could this affect distributed training that might make the assumption of rotational invariance?