Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Erik Garrison
Karma:
2
All
Posts
Comments
New
Top
Old
Erik Garrison
8 Sep 2024 4:50 UTC
3
points
0
on:
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Could this affect distributed training that might make the assumption of rotational invariance?
Back to top
Could this affect distributed training that might make the assumption of rotational invariance?