It would be even more powerful if it generated a transcript automatically (though that’s currently difficult and expansive).
A few points on this:
Some Youtube videos already come with good captions.
For the rest, Youtube provides automatic captions. These are really bad, lack punctuation and capitalization, but even at that level of quality they could e.g. be used to pinpoint where something was said.
Transcription via OpenAI Whisper is cheap ($0.36 per hour) and quite decent if there’s only one speaker. For interviews and podcasts, the experience is not good enough for transcription (to create this podcast transcript at the beginning of the year, I used Whisper as a base, but still had to put in many many hours of editing), because it e.g. doesn’t do speaker diarisation or insert paragraph breaks. But I’m pretty sure that by now there are hybrid services out there which can do even the things Whisper is bad at. This still won’t yield a professional-level transcript, though doing an editing pass with GPT4 might close the gap. My point is, these transcripts are not expensive, relative to labor costs.
The implementation of automatic AI transcripts has become surprisingly simple. E.g. as I mentioned here, I now get automatic transcripts for my voice notes, based on following a step-by-step video guide. The difficulty is not yet at consumer-level simple (though for those purposes, one can just pay for an AI transcription service app), but it’s definitely already at the level of hobbyist-simple.
A few points on this:
Some Youtube videos already come with good captions.
For the rest, Youtube provides automatic captions. These are really bad, lack punctuation and capitalization, but even at that level of quality they could e.g. be used to pinpoint where something was said.
Transcription via OpenAI Whisper is cheap ($0.36 per hour) and quite decent if there’s only one speaker. For interviews and podcasts, the experience is not good enough for transcription (to create this podcast transcript at the beginning of the year, I used Whisper as a base, but still had to put in many many hours of editing), because it e.g. doesn’t do speaker diarisation or insert paragraph breaks. But I’m pretty sure that by now there are hybrid services out there which can do even the things Whisper is bad at. This still won’t yield a professional-level transcript, though doing an editing pass with GPT4 might close the gap. My point is, these transcripts are not expensive, relative to labor costs.
The implementation of automatic AI transcripts has become surprisingly simple. E.g. as I mentioned here, I now get automatic transcripts for my voice notes, based on following a step-by-step video guide. The difficulty is not yet at consumer-level simple (though for those purposes, one can just pay for an AI transcription service app), but it’s definitely already at the level of hobbyist-simple.