If AI automates most, but not all, software engineering, moats of software dependencies could get more entrenched, because easier-to-use libraries have compounding first-mover advantages.
I don’t think the advantages would necessarily compound—quite the opposite, there are diminishing returns and I expect ‘catchup’. The first-mover advantage neutralizes itself because a rising tide lifts all boats, and the additional data acts as a prior: you can define the advantage of a better model, due to any scaling factor, as equivalent to n additional datapoints. (See the finetuning transfer papers on this.) When a LLM can zero-shot a problem, that is conceptually equivalent to a dumber LLM which needs 3-shots, say. And so the advantages of a better model will plateau, and can be matched by simply some more data in-context—such as additional synthetic datapoints generated by self-play or inner-monologue etc. And the better the model gets, the more ‘data’ it can ‘transfer’ to a similar language to reach a given X% of coding performance. (Think about how you could easily transfer given access to an environment: just do self-play on translating any solved Python problem into the target language. You already, by stipulation, have an ‘oracle’ to check outputs of the target against, which can produce counterexamples.) To a sad degree, pretty much all programming languages are the same these days: ALGOL with C sugaring to various degrees and random ad hoc addons; a LLM which can master Python can master Javascript can master Typescript… The hard part is the non-programming-language parts, the algorithms and reasoning and being able to understand & model the implicit state updates—not memorizing the standard library of some obscure language.
So at some point, even if you have a model which is god-like at Python (at which point each additional Python datapoint adds basic next to nothing), you will find it is completely acceptable at JavaScript, say, or even your brand-new language with 5 examples which you already have on hand in the documentation. You don’t need ‘the best possible performance’, you just need some level of performance adequate to achieve your goal. If the Python is 99.99% on some benchmark, you are probably fine with 99.90% performance in your favorite language. (Presumably there is some absolute level like 99% at which point automated CUDA → ROCm becomes possible, and it is independent of whether some other language has even higher accuracy.) All you need is some minor reason to pay that slight non-Python tax. And that’s not hard to find.
If AI automates most, but not all, software engineering
Also, I suspect that the task of converting CUDA code to ROCm code might well fall into the ‘most’ category rather than being the holdout programming tasks. This is a category of code ripe for automation: you have, again by stipulation, correct working code which can be imitated and used as an oracle autonomously to brute force translation, which usually has very narrow specific algorithmic tasks (‘multiply this matrix by that matrix to get this third matrix; every number should be identical’), random test-cases are easy to generate (just big grids of numbers), and where the non-algorithmic number also has simple end-to-end metrics (‘loss go down per wallclock second’) to optimize. Compared to a lot of areas, like business logic or GUIs, this seems much more amenable to tasking LLMs with. geohot may lack the followthrough to make AMD GPUs work, and plow through papercut after papercut, but there would be no such problem for a LLM.
So I agree with Wentsworth that there seems to be a bit of a tricky transition here for Nvidia: it’s always not been worth the time & hassle to try to use an AMD GPU (although a few claim to have made it work out financially for them), because of the skilled labor and wallclock and residual technical risk and loss of flexibility ecosystem; but if LLM coding works out well enough and intelligence becomes ‘too cheap to meter’, almost all of that goes away. Even ordinary unsophisticated GPU buyers will be able to tell their LLM to ‘just make it work on my new GPU, OK? I don’t care about the details, just let me know when you’re done’. At this point, what is the value-add for Nvidia? If they cut down their fat margins and race to the bottom for the hardware, where do they go for the profits? The money all seems to be in the integration and services—none of which Nvidia is particularly good at. (They aren’t even all that good at training LLMs! The Megatron series was a disappointment, like Megatron-NLG-530b is barely a footnote, and even the latest Nemo seems to barely match Llama-3-70b which being like 4x larger and thus more expensive to run.)
And this will be true of anyone who is relying on software lockin: if the lockin is because it would take a lot of software engineer time to do a reverse-engineering rewrite and replacement, then it’s in serious danger in a LLM human coding level world. In a world where you can hypothetically spin up a thousand SWEs on a cloud service, tell them, ‘write me an operating system like XYZ’, and they do so overnight while you sleep, durable software moats are going to require some sort of mysterious blackbox like a magic API; anything which is so modularized as to fit on your own computer is also sufficiently modularized as to easily clone & replace...
This isn’t a pure software engineering time lockin; some of that money is going to go to legal action looking for a hint big targets have done the license-noncompliant thing.
Edit: Additionally, I don’t think a world where “most but not all” software engineering is automated is one where it will be a simple matter to spin up a thousand effective SWEs of that capability; I think there’s first a world where that’s still relatively expensive even if most software engineering is being done by automated systems. Paying $8000 for overnight service of 1000 software engineers would be a rather fine deal, currently, but still too much for most people.
I don’t think that will be at all important. You are creating alternate reimplementations of the CUDA API, you aren’t ‘translating’ or decompiling it. And if you are buying billions of dollars of GPUs, you can afford to fend off some Nvidia probes and definitely can pay $0.000008b periodically for an overnighter. (Indeed, Nvidia needing to resort to such Oracle-like tactics is a bear sign.)
While there’s truth in what you say, I also think a market that’s running thousands of software engineers is likely to be hungry for as many good GPUs as the current manufacturers can make. NVIDIA not being able to sustain a relative monopoly forever still doesn’t put it in a bad position.
People will hunger for all the GPUs they can get, but then that means that the favored alternative GPU ‘manufacturer’ simply buys out the fab capacity and does so. Nvidia has no hardware moat: they do not own any chip fabs, they don’t own any wafer manufacturers, etc. All they do is design and write software and all the softer human-ish bits. They are not ‘the current manufacturer’ - that’s everyone else, like TSMC or the OEMs. Those are the guys who actually manufacture things, and they have no particular loyalty to Nvidia. If AMD goes to TSMC and asks for a billion GPU chips, TSMC will be thrilled to sell the fab capacity to AMD rather than Nvidia, no matter how angry Jensen is.
So in a scenario like mine, if everyone simply rewrites for AMD, AMD raises its prices a bit and buys out all of the chip fab capacity from TSMC/Intel/Samsung/etc—possibly even, in the most extreme case, buying capacity from Nvidia itself, as it suddenly is unable to sell anything at its high prices that it may be trying to defend, and is forced to resell its reserved chip fab capacity in the resulting liquidity crunch. (No point in spending chip fab capacity on chips you can’t sell at your target price and you aren’t sure what you’re going to do.) And if AMD doesn’t do so, then player #3 does so, and everyone rewrites again (which will be easier the second time as they will now have extensive test suites, two different implementations to check correctness against, documentation from the previous time, and AIs which have been further trained on the first wave of work).
I don’t think the advantages would necessarily compound—quite the opposite, there are diminishing returns and I expect ‘catchup’. The first-mover advantage neutralizes itself because a rising tide lifts all boats, and the additional data acts as a prior: you can define the advantage of a better model, due to any scaling factor, as equivalent to n additional datapoints. (See the finetuning transfer papers on this.) When a LLM can zero-shot a problem, that is conceptually equivalent to a dumber LLM which needs 3-shots, say. And so the advantages of a better model will plateau, and can be matched by simply some more data in-context—such as additional synthetic datapoints generated by self-play or inner-monologue etc. And the better the model gets, the more ‘data’ it can ‘transfer’ to a similar language to reach a given X% of coding performance. (Think about how you could easily transfer given access to an environment: just do self-play on translating any solved Python problem into the target language. You already, by stipulation, have an ‘oracle’ to check outputs of the target against, which can produce counterexamples.) To a sad degree, pretty much all programming languages are the same these days: ALGOL with C sugaring to various degrees and random ad hoc addons; a LLM which can master Python can master Javascript can master Typescript… The hard part is the non-programming-language parts, the algorithms and reasoning and being able to understand & model the implicit state updates—not memorizing the standard library of some obscure language.
So at some point, even if you have a model which is god-like at Python (at which point each additional Python datapoint adds basic next to nothing), you will find it is completely acceptable at JavaScript, say, or even your brand-new language with 5 examples which you already have on hand in the documentation. You don’t need ‘the best possible performance’, you just need some level of performance adequate to achieve your goal. If the Python is 99.99% on some benchmark, you are probably fine with 99.90% performance in your favorite language. (Presumably there is some absolute level like 99% at which point automated CUDA → ROCm becomes possible, and it is independent of whether some other language has even higher accuracy.) All you need is some minor reason to pay that slight non-Python tax. And that’s not hard to find.
Also, I suspect that the task of converting CUDA code to ROCm code might well fall into the ‘most’ category rather than being the holdout programming tasks. This is a category of code ripe for automation: you have, again by stipulation, correct working code which can be imitated and used as an oracle autonomously to brute force translation, which usually has very narrow specific algorithmic tasks (‘multiply this matrix by that matrix to get this third matrix; every number should be identical’), random test-cases are easy to generate (just big grids of numbers), and where the non-algorithmic number also has simple end-to-end metrics (‘loss go down per wallclock second’) to optimize. Compared to a lot of areas, like business logic or GUIs, this seems much more amenable to tasking LLMs with. geohot may lack the followthrough to make AMD GPUs work, and plow through papercut after papercut, but there would be no such problem for a LLM.
So I agree with Wentsworth that there seems to be a bit of a tricky transition here for Nvidia: it’s always not been worth the time & hassle to try to use an AMD GPU (although a few claim to have made it work out financially for them), because of the skilled labor and wallclock and residual technical risk and loss of flexibility ecosystem; but if LLM coding works out well enough and intelligence becomes ‘too cheap to meter’, almost all of that goes away. Even ordinary unsophisticated GPU buyers will be able to tell their LLM to ‘just make it work on my new GPU, OK? I don’t care about the details, just let me know when you’re done’. At this point, what is the value-add for Nvidia? If they cut down their fat margins and race to the bottom for the hardware, where do they go for the profits? The money all seems to be in the integration and services—none of which Nvidia is particularly good at. (They aren’t even all that good at training LLMs! The Megatron series was a disappointment, like Megatron-NLG-530b is barely a footnote, and even the latest Nemo seems to barely match Llama-3-70b which being like 4x larger and thus more expensive to run.)
And this will be true of anyone who is relying on software lockin: if the lockin is because it would take a lot of software engineer time to do a reverse-engineering rewrite and replacement, then it’s in serious danger in a LLM human coding level world. In a world where you can hypothetically spin up a thousand SWEs on a cloud service, tell them, ‘write me an operating system like XYZ’, and they do so overnight while you sleep, durable software moats are going to require some sort of mysterious blackbox like a magic API; anything which is so modularized as to fit on your own computer is also sufficiently modularized as to easily clone & replace...
It’s probably worth mentioning that there’s now a licensing barrier to running CUDA specifically through translation layers: https://www.tomshardware.com/pc-components/gpus/nvidia-bans-using-translation-layers-for-cuda-software-to-run-on-other-chips-new-restriction-apparently-targets-zluda-and-some-chinese-gpu-makers
This isn’t a pure software engineering time lockin; some of that money is going to go to legal action looking for a hint big targets have done the license-noncompliant thing.
Edit: Additionally, I don’t think a world where “most but not all” software engineering is automated is one where it will be a simple matter to spin up a thousand effective SWEs of that capability; I think there’s first a world where that’s still relatively expensive even if most software engineering is being done by automated systems. Paying $8000 for overnight service of 1000 software engineers would be a rather fine deal, currently, but still too much for most people.
I don’t think that will be at all important. You are creating alternate reimplementations of the CUDA API, you aren’t ‘translating’ or decompiling it. And if you are buying billions of dollars of GPUs, you can afford to fend off some Nvidia probes and definitely can pay $0.000008b periodically for an overnighter. (Indeed, Nvidia needing to resort to such Oracle-like tactics is a bear sign.)
While there’s truth in what you say, I also think a market that’s running thousands of software engineers is likely to be hungry for as many good GPUs as the current manufacturers can make. NVIDIA not being able to sustain a relative monopoly forever still doesn’t put it in a bad position.
People will hunger for all the GPUs they can get, but then that means that the favored alternative GPU ‘manufacturer’ simply buys out the fab capacity and does so. Nvidia has no hardware moat: they do not own any chip fabs, they don’t own any wafer manufacturers, etc. All they do is design and write software and all the softer human-ish bits. They are not ‘the current manufacturer’ - that’s everyone else, like TSMC or the OEMs. Those are the guys who actually manufacture things, and they have no particular loyalty to Nvidia. If AMD goes to TSMC and asks for a billion GPU chips, TSMC will be thrilled to sell the fab capacity to AMD rather than Nvidia, no matter how angry Jensen is.
So in a scenario like mine, if everyone simply rewrites for AMD, AMD raises its prices a bit and buys out all of the chip fab capacity from TSMC/Intel/Samsung/etc—possibly even, in the most extreme case, buying capacity from Nvidia itself, as it suddenly is unable to sell anything at its high prices that it may be trying to defend, and is forced to resell its reserved chip fab capacity in the resulting liquidity crunch. (No point in spending chip fab capacity on chips you can’t sell at your target price and you aren’t sure what you’re going to do.) And if AMD doesn’t do so, then player #3 does so, and everyone rewrites again (which will be easier the second time as they will now have extensive test suites, two different implementations to check correctness against, documentation from the previous time, and AIs which have been further trained on the first wave of work).
But why would the profit go to NVIDIA, rather than TSMC? The money should go to the company with the scarce factor of production.