I completely agree with your post in almost all senses, and this is coming from someone who has also worked out in the real world, with real problems, trying to collect and analyze real data (K-12 education, specifically—talk about a hard environment in which to do data collection and analyzation, the data is inherently very messy, and the analyzation is very high stakes).
But this part
For AI to make really serious economic impact, after we’ve exploited the low-hanging fruit around public Internet data, it needs to start learning from business data and making substantial improvements in the productivity of large companies.
If you’re imagining an “AI R&D researcher” inventing lots of new technologies, for instance, that means integrating it into corporate R&D, which primarily means big manufacturing firms with heavy investment into science/engineering innovation (semiconductors, pharmaceuticals, medical devices and scientific instruments, petrochemicals, automotive, aerospace, etc). You’d need to get enough access to private R&D data to train the AI, and build enough credibility through pilot programs to gradually convince companies to give the AI free rein, and you’d need to start virtually from scratch with each new client. This takes time, trial-and-error, gradual demonstration of capabilities, and lots and lots of high-paid labor, and it is barely being done yet at all.
I think undersells the extent to which
A) the big companies have already started to understand that their data is everything and that collecting, tracking, and analyzing every piece of business data they have is the most strategic move they can make, regardless of AI
B) the fact that even current levels of AI will begin speeding up the data integration efforts by orders of magnitude (automating the low-hanging fruit for data cleaning alone could save thousands of person hours for a company)
Between those two things, I think it’s a few years at most before the conduits for sharing and analyzing this core business data are set up at scale. I work in the big tech software industry and know for a fact that this is already happening in a big way. And more and more, businesses of all sizes are getting used to the SaaS infrastructure where you pay for a company to have access to specific (or all) parts of your business such that they provide a blanket service for you that you know will help you. Think of all of the cloud security companies and how quickly that got stood up, or all the new POS platforms. I think those are more correct analogies than the massive hardware scaling that had to happen during the microchip and then PC booms. (Of course, there’s datacenter scaling that must happen, but that’s a manifestly different, more centralized concern.)
TL;DR: I think you make a lot of valuable insights about how organizations actually work with data under the current paradigms. But I don’t think this data integration dynamic will slow down take off as much as you imply.
I completely agree with your post in almost all senses, and this is coming from someone who has also worked out in the real world, with real problems, trying to collect and analyze real data (K-12 education, specifically—talk about a hard environment in which to do data collection and analyzation, the data is inherently very messy, and the analyzation is very high stakes).
But this part
I think undersells the extent to which
A) the big companies have already started to understand that their data is everything and that collecting, tracking, and analyzing every piece of business data they have is the most strategic move they can make, regardless of AI
B) the fact that even current levels of AI will begin speeding up the data integration efforts by orders of magnitude (automating the low-hanging fruit for data cleaning alone could save thousands of person hours for a company)
Between those two things, I think it’s a few years at most before the conduits for sharing and analyzing this core business data are set up at scale. I work in the big tech software industry and know for a fact that this is already happening in a big way. And more and more, businesses of all sizes are getting used to the SaaS infrastructure where you pay for a company to have access to specific (or all) parts of your business such that they provide a blanket service for you that you know will help you. Think of all of the cloud security companies and how quickly that got stood up, or all the new POS platforms. I think those are more correct analogies than the massive hardware scaling that had to happen during the microchip and then PC booms. (Of course, there’s datacenter scaling that must happen, but that’s a manifestly different, more centralized concern.)
TL;DR: I think you make a lot of valuable insights about how organizations actually work with data under the current paradigms. But I don’t think this data integration dynamic will slow down take off as much as you imply.