Looks like all the followups to BEiT masked autoencoder image modeling are doing similar things: https://arxiv.org/abs/2202.03382 https://arxiv.org/abs/2202.03026 https://arxiv.org/abs/2202.04200
Looks like all the followups to BEiT masked autoencoder image modeling are doing similar things: https://arxiv.org/abs/2202.03382 https://arxiv.org/abs/2202.03026 https://arxiv.org/abs/2202.04200