Past Account comments on [missing post]

Past Account 11 May 2020 2:57 UTC
1 point
I think it’s worth taking a look at what’s out there:
- SpanBERT
  - Uses random spans to do masked pre-training
  - Seems to indicate that using longer spans is essentially difficult
- Distillation of BERT Models
  - BERT embeddings are hierarchical