Aprillion comments on The Plan − 2024 Update

Aprillion Jan 2, 2025, 4:56 PM
7 points
0
Focus On Image Generators
How about audio? Is the speech-to-text domain as “close to the metal” as possible to deserve focus too or did people hit roadblocks that made image generators more attractive? If the latter, where can I read about the lessons learned, please?
- Mateusz Bagiński Apr 14, 2025, 10:42 AM
  3 points
  0
  Parent
  I know almost nothing about audio ML, but I would expect one big inconvenience when doing audio-NN-interp to be that a lot of complexity in sound is difficult to represent visually. Images and text (/token strings) don’t have this problem.