There are a few things in the calculation that seem wrong to me:
If I did things right,15 years * (365 days/yr) * (24 hours/day) * (60 mins/hour) * (50 youtube!hours / min) * (60 youtube!mins / youtube!hour) = 24B youtube!minutes, not 200B.
I’d expect much less than 100% of Youtube video time to contain speech. I don’t know what a reasonable discount for this would be, though.
In the opposite direction, 1% useful seems too low. IIRC, web scrape quality pruning discards less than 99%, and this data is less messy than a web scrape.
In any case, yeah, this does not seem like a huge amount of data. But there’s enough order-of-magnitude fuzziness in the estimate that it does seem like it’s worth someone’s time to look into more seriously.
Very interesting!
There are a few things in the calculation that seem wrong to me:
If I did things right,15 years * (365 days/yr) * (24 hours/day) * (60 mins/hour) * (50 youtube!hours / min) * (60 youtube!mins / youtube!hour) = 24B youtube!minutes, not 200B.
I’d expect much less than 100% of Youtube video time to contain speech. I don’t know what a reasonable discount for this would be, though.
In the opposite direction, 1% useful seems too low. IIRC, web scrape quality pruning discards less than 99%, and this data is less messy than a web scrape.
In any case, yeah, this does not seem like a huge amount of data. But there’s enough order-of-magnitude fuzziness in the estimate that it does seem like it’s worth someone’s time to look into more seriously.