I’m not seeing anything here about the costs of data collection (for licenced stuff) or curation (probably hundreds of thousands of cheap hours?), apart from one bullet on OAI’s combined costs. As a total outsider I would guess this could move your estimates by 20-100%.
I think that hundreds of thousands of cheap labor hours for curation is a reasonable guess, but this likely comes to under a million dollars in total which is less than 1% of the total.
I have not seen any substantial evidence of OpenAI paying for licenses before the training of GPT-4, much less the sort of expenditures that would move the needle on the total cost.
After training GPT-4 we do see things like a deal between OpenAI and the Associated Press (also see this article on that which mentions a first mover clause) with costs looking to be in the millions—more than 1% of the cost of GPT-4 but notably it seems that this came after GPT-4. I expect GPT-5, which this sort of deal might be relevant for, to cost substantially more. It’s possible I’m wrong about the timing and substantial deals of this sort were in fact made before GPT-4 but I have not seen substantive evidence of this.
I’m not seeing anything here about the costs of data collection (for licenced stuff) or curation (probably hundreds of thousands of cheap hours?), apart from one bullet on OAI’s combined costs. As a total outsider I would guess this could move your estimates by 20-100%.
I talk about this in the Granular Analysis subsection, but I’ll elaborate a bit here.
I think that hundreds of thousands of cheap labor hours for curation is a reasonable guess, but this likely comes to under a million dollars in total which is less than 1% of the total.
I have not seen any substantial evidence of OpenAI paying for licenses before the training of GPT-4, much less the sort of expenditures that would move the needle on the total cost.
After training GPT-4 we do see things like a deal between OpenAI and the Associated Press (also see this article on that which mentions a first mover clause) with costs looking to be in the millions—more than 1% of the cost of GPT-4 but notably it seems that this came after GPT-4. I expect GPT-5, which this sort of deal might be relevant for, to cost substantially more. It’s possible I’m wrong about the timing and substantial deals of this sort were in fact made before GPT-4 but I have not seen substantive evidence of this.