Off the top of my head: residual (skip) connections, improved ways of doing positional embeddings/encodings, and layer norm.
Off the top of my head: residual (skip) connections, improved ways of doing positional embeddings/encodings, and layer norm.