Rodrigo Heck comments on Whisper’s Wild Implications

Rodrigo Heck 3 Jan 2023 21:12 UTC
2 points
1
A better approach IMO is to directly tokenize audio and then find a clever way to align text tokens with audio tokens during training, without relying on 100% transcription.