A few weeks ago, I (David) tried to argue that AI wasn’t software. In retrospect, I think I misstated the case by skipping some inferential steps, and based on some feedback from that article, and on an earlier version of this article, with a large assist by Abram Demski, I’m going to try again.
The best response to my initial post, by @gjm, explained the point more succinctly and better than I had; “An AI system is software in something like the same way a human being is chemistry.” And yes, of course the human body is chemistry. So in that sense, I was wrong—arguing that AI isn’t software is, in some sense, arguing that the human body isn’t chemistry. But the point was that we don’t think about humans in terms of chemistry.
“The Categories Were Made For Man, Not Man For The Categories.” That is, what we call things is based on inference about which categories should be used, and what features are definitive or not. And any categorization has multiple purposes, and so the question of whether to categorize things together or separately is collapsing many questions together.
The prior essay spent time arguing that Software and AI are different. The way software is developed is different from the way AI is developed, and the way software behaves and how it fails is different from how AI behaves and fails. Here, I’ll add two more dimensions; that the people and the expertise for AI is different than that for software, and that AI differs from software in ways similar to how software differs from hardware. In between those two, I’ll introduce a conceptual model from Abram Demski explaining that AI is a different type of tool than software that captures much more of the point.
Based on that, we’ll get to the key point that was obscured initially; if AI is a different type of thing, what does that imply about it—and less importantly, what should we name the new category?
Who does what?
If we ask a stranger what they do, and they say chemistry, we would be surprised to learn that they were a medical doctor. On the other hand, if someone is a medical doctor, we expect them to know a reasonable amount about biochemistry.
I have a friend who was interested in business, but did a bachelors in scientific computational methods. He went on to get an MBA, and did research on real time pricing in electrical markets—a field where his background was essential. He told me once that as an undergrad, he managed As in his classes, but got weird looks when he asked what a compiler was, and how he was supposed to run code on his computer. He wasn’t a computer scientist, he was just using computer science. Computational numerical methods were a great tool for him, and it was useful to understand financial markets, but he certainly wouldn’t tell people he was a computer scientist or mathematician. These two domains are connected, but not the same.
Returning to the earlier question, software and AI are connected. If someone says they do software development, we would be surprised if they mainly published AI research. And this goes both ways. The skills needed to do AI research or build AI systems sometimes require a familiarity with software development, but other times, it does not. There are people who do prompt engineering for language models that can’t write any code—and their contributions are nonetheless absolutely vital to making many AI systems work. There are people who do mathematical analysis of deep learning, and can explain the relationship between different activation functions and model structures and how that affects how they converge, and also don’t write code. People who write code may or may not work with AI, but everyone who does prompt engineering for LLMs or mathematical analysis of deep learning is doing work with AI.
What Kind of Tool is AI?
Abram suggests that we can make a rough accounting of shifting technological paradigms as follows:
Tools
Machines
Electric
Electronic
Digital
Each of these is largely but not entirely a subset of the prior level. Yes, there are machines that aren’t really tools, say, because they are toys, and yes, there are electric or electronic systems that aren’t machines in the mechanical or similar senses. Despite this, we can see a progression—not that when machines were invented people stopped using tools, or that digital devices replaced earlier devices, but that they are different.
What makes each category conceptually different? Each shift in paradigm is somewhat different, but we do see a progression. We might still ask what defines this progression, or what changes between levels? A full account would need its own essay, or book, and the devil is in the details, but some common themes here are increasing complexity, increasing automation, (largely) diminishing size of functional components, asking less from humans (first, less time and energy; later, less information).
The shift from “electric” to “electronic” seems complex, but as electrical components got smaller and more refined, there was a shift away from merely handling energy and toward using electricity for information processing. If I ask you to fill in the blank in “electric ____” you might think of an electric lightbulb, electric motor, or electric kettle, appliances which focus primarily on converting electricity to another form of energy. And if I ask you to fill in the blank in “electronic ____” you might think of an electronic calculator, electronic thermometer, or electronic watch. In each case, these devices are more about information than physical manipulation. However, this is not a shift from using electric current to using electrons, as one early reader suggested. Both use electricity, but we start to see a distinction or shift in conceptual approaches from “components” like resistors, magnets, motors, and transistors, to “circuits″ which chain components together in order to implement some desired logic.
Shifting from the electronic paradigm to the digital one, we see the rise of a hardware/software distinction. Pong was (iirc) designed as a circuit, not programmed as software—but video games would soon make the switch. And “programming” emerges as an activity separate from electrical engineering, or circuit design. “Programmers” think about things like algorithms, logic, and variables with values. Obviously all of these are accomplished in ways logically equivalent to a circuit, but the conceptual model changed.
Hardware, software, and.. deepware?
In a comment, Abram noted that a hardware enthusiast could argue against making a software/hardware distinction. The idea of “software” is misleading because it distracts from the physical reality. Even software is still present physically as magnetic states in the computer’s hard drive, or in the circuits. And obviously, software doesn’t do anything hardware can’t do, since software doing something is just hardware doing it. This could be considered different than previous distinctions between levels; a digital calculator is doing something an electric device can’t, while an electric kettle is just doing what another machine does by using electricity instead of some chemical fuel.
But Abram pointed out that thinking in this way will not be a very good way of predicting reality. The hypothetical hardware enthusiast would not be able to predict the rise of the “programmer” profession, or the great increase in complexity of things that machines can do thanks to “programming”.
The argument is that machine learning is a shift of comparable importance, such that it makes more sense to categorize generative AI models as “something else” in much the same way that software is not categorized as hardware (even though it is made of physical stuff).
It is more helpful to think of modern AI as a paradigm shift in the same way that the shift from “electronic” (hardware) to “digital” (software) was a paradigm shift. In other words: the digital age has led to the rise of generative AI, in much the same way that the electric age enabled the rise of electronics. One age doesn’t end, and we’re still using electricity for everything (indeed, for even more things,) but “electric” stopped being the most interesting abstraction. Now, a shift to deep learning and AI means that things like “program”, “code”, “algorithm” are starting to not be the best or most relevant abstraction either.
Is this really different?
When seeing the above explanation, @gjm commented that “I suppose you could say a complicated Excel spreadsheet monstrosity is ‘software’ but it’s quite an unusual kind of software and the things you do to improve or debug it aren’t the same as the ones you do with a conventional program. AI is kinda like these but more so.”
The question is whether “more so” is a evolutionary or revolutionary change. Yes, toasters are different from generators, and the types of things you do to improve or debug them are different, but there is no conceptual shift. You do not need new conceptual tools to understand and debug spreadsheets, even if they are the types of horrendous monstrosities I’ve worked with in finance. On the other hand, there have obviously been smaller paradigm shifts within the larger umbrella of software, from assembly to procedural programming to object oriented programming and so on. And these did involve conceptual shifts and new classes of tools; type systems and type checking were a new concept when shifting from machine code programming in assembly to more abstract programming languages, even though bits, bytes, and words were conceptually distinct in machine code.
It could be debated which shifts should be considered separate paradigms, but the shift to deep learning required a new set of conceptual tools. We need to go back to physics to understand why electronic circuits work, they don’t really work just as analogies to mechanical systems. @gjm explained this clearly; “We’ve got a bunch of things supervening on one another: laws of physics, principles of electronics, digital logic, lower-level software, higher-level software, neural network, currently-poorly-understood structures inside LLMs, something-like-understanding, something-like-meaning. Most of the time, in order to understand a higher-level thing it isn’t very useful to think in terms of the lower-level things.”
This seems to hit on the fundamental difference. When a new paradigm supervenes on a previous one, it doesn’t just add to it, or logically follow. Instead, the old conceptual models fail, and you need new concepts. So type theory is understood via concepts that are coherent in the terms we use to talk about debugging logic in earlier programming. On the other hand, programming instead of circuit design or electronics required a more fundamental regrounding. The new paradigm did not build further on physics and extend mathematical approaches previously used for analog circuit design. Instead, it required the development of new mathematical formalisms and approaches—finite state machines and Turing completeness for programs, first-order predicate logic for databases, and similar. The claim here is that deep learning requires a similar rethinking, not just building conceptual tools on top of those we already have.
What’s in a Name?
Terminology can be illuminating or obscuring, and naming the next step in technological progress is tricky. Electronics is used as a different word than electric, but it’s not as though electrons are more specifically involved; static electricity, resistors, PN junctions, and circuits all involve electrons. Similarly, “software” is not a great name to describe a change from electronic components to data, but both terms stuck. (I clearly recall trying to explain to younger students that they were called floppy disks because the old ones were actually floppy; now, the only thing remaining of that era is the icon that my kids don’t recognize as representing a physical object.)
Currently, we seem to have moved from calling these new methods and tools “machine learning” to calling them “AI,” and both indicate something about how this isn’t software, it’s something different, but neither term really captures the current transition. The product created by machine learning isn’t that the machine was learning, it’s that the derived model can do certain things on the basis of what it infers from data. Many of those things are (better or different versions of) normal types of statistical inference, including categorization, but not all of them. And calling ML statistics misses the emergent capabilities of GANs, LLMs, Diffusion models, and similar.
On the other hand, current “AI” is rightly considered neither artificial nor intelligent. It’s not completely artificial, because other than a few places like self-play training for GANs, it’s trained on human expertise and data. In that way, it’s more clearly machine learning (via imitating humans.) It’s also not currently intelligent in many senses, because it’s non-agentic and very likely not conscious. And in either case, it’s definitely not what people envisioned decades ago when they spoke about “AI.”
The other critical thing that the current terms seem to miss is the deep and inscrutable nature of the systems. There are individuals who understand at least large sections of every large software project, which is necessary for development, but the same is not and never need be true for deep learning models. Even to the extent that interpretability or explainability is successful, the systems are far more complex than humans can fully understand. I think that “deepware” captures some of this, and am indebted to @Oliver Sourbut for the suggestion.
Conclusion
Deep learning models use electricity and run on computers that can be switched on and off, but are not best thought of as electric brains. Deep learning models run on hardware, but are not best thought of as electronic brains. Deep learning models are executed as software instructions, but are not best thought of as software brains. And in the same way, software built on top of these deep learning models to create ”AI systems” provides a programming-interface for the models. But these are not designed systems, they are inscrutable models grown on vast datasets.
It is tempting to think of the amalgam of a deep learning model and the software as a software product. Yes, it is accessed via API, run by software, on hardware, with electricity, but instead of thinking of software, hardware, or electrical systems, we need to see them as what they are. That doesn’t necessarily mean the best way of thinking about them is as inscrutable piles of linear algebra, or as shoggoths, or as artificial intelligence, but it does mean seeing them as something different, and not getting trapped in the wrong paradigm.
Thanks to @gjm for the initial comment and the resulting discussion, and to @zoop for his disagreements, an to both for their feedback on an earlier draft. Thanks to @Gerald Monroe and @noggin-scratcher for pushback and conversation on the original post. Finally, thanks to @Daniel Kokotajlo for initially suggesting “deepnets” and again thanks to @Oliver Sourbut for suggesting “deepware”
Technologies and Terminology: AI isn’t Software, it’s… Deepware?
A few weeks ago, I (David) tried to argue that AI wasn’t software. In retrospect, I think I misstated the case by skipping some inferential steps, and based on some feedback from that article, and on an earlier version of this article, with a large assist by Abram Demski, I’m going to try again.
The best response to my initial post, by @gjm, explained the point more succinctly and better than I had; “An AI system is software in something like the same way a human being is chemistry.” And yes, of course the human body is chemistry. So in that sense, I was wrong—arguing that AI isn’t software is, in some sense, arguing that the human body isn’t chemistry. But the point was that we don’t think about humans in terms of chemistry.
“The Categories Were Made For Man, Not Man For The Categories.” That is, what we call things is based on inference about which categories should be used, and what features are definitive or not. And any categorization has multiple purposes, and so the question of whether to categorize things together or separately is collapsing many questions together.
The prior essay spent time arguing that Software and AI are different. The way software is developed is different from the way AI is developed, and the way software behaves and how it fails is different from how AI behaves and fails. Here, I’ll add two more dimensions; that the people and the expertise for AI is different than that for software, and that AI differs from software in ways similar to how software differs from hardware. In between those two, I’ll introduce a conceptual model from Abram Demski explaining that AI is a different type of tool than software that captures much more of the point.
Based on that, we’ll get to the key point that was obscured initially; if AI is a different type of thing, what does that imply about it—and less importantly, what should we name the new category?
Who does what?
If we ask a stranger what they do, and they say chemistry, we would be surprised to learn that they were a medical doctor. On the other hand, if someone is a medical doctor, we expect them to know a reasonable amount about biochemistry.
I have a friend who was interested in business, but did a bachelors in scientific computational methods. He went on to get an MBA, and did research on real time pricing in electrical markets—a field where his background was essential. He told me once that as an undergrad, he managed As in his classes, but got weird looks when he asked what a compiler was, and how he was supposed to run code on his computer. He wasn’t a computer scientist, he was just using computer science. Computational numerical methods were a great tool for him, and it was useful to understand financial markets, but he certainly wouldn’t tell people he was a computer scientist or mathematician. These two domains are connected, but not the same.
Returning to the earlier question, software and AI are connected. If someone says they do software development, we would be surprised if they mainly published AI research. And this goes both ways. The skills needed to do AI research or build AI systems sometimes require a familiarity with software development, but other times, it does not. There are people who do prompt engineering for language models that can’t write any code—and their contributions are nonetheless absolutely vital to making many AI systems work. There are people who do mathematical analysis of deep learning, and can explain the relationship between different activation functions and model structures and how that affects how they converge, and also don’t write code. People who write code may or may not work with AI, but everyone who does prompt engineering for LLMs or mathematical analysis of deep learning is doing work with AI.
What Kind of Tool is AI?
Abram suggests that we can make a rough accounting of shifting technological paradigms as follows:
Tools
Machines
Electric
Electronic
Digital
Each of these is largely but not entirely a subset of the prior level. Yes, there are machines that aren’t really tools, say, because they are toys, and yes, there are electric or electronic systems that aren’t machines in the mechanical or similar senses. Despite this, we can see a progression—not that when machines were invented people stopped using tools, or that digital devices replaced earlier devices, but that they are different.
What makes each category conceptually different? Each shift in paradigm is somewhat different, but we do see a progression. We might still ask what defines this progression, or what changes between levels? A full account would need its own essay, or book, and the devil is in the details, but some common themes here are increasing complexity, increasing automation, (largely) diminishing size of functional components, asking less from humans (first, less time and energy; later, less information).
The shift from “electric” to “electronic” seems complex, but as electrical components got smaller and more refined, there was a shift away from merely handling energy and toward using electricity for information processing. If I ask you to fill in the blank in “electric ____” you might think of an electric lightbulb, electric motor, or electric kettle, appliances which focus primarily on converting electricity to another form of energy. And if I ask you to fill in the blank in “electronic ____” you might think of an electronic calculator, electronic thermometer, or electronic watch. In each case, these devices are more about information than physical manipulation. However, this is not a shift from using electric current to using electrons, as one early reader suggested. Both use electricity, but we start to see a distinction or shift in conceptual approaches from “components” like resistors, magnets, motors, and transistors, to “circuits″ which chain components together in order to implement some desired logic.
Shifting from the electronic paradigm to the digital one, we see the rise of a hardware/software distinction. Pong was (iirc) designed as a circuit, not programmed as software—but video games would soon make the switch. And “programming” emerges as an activity separate from electrical engineering, or circuit design. “Programmers” think about things like algorithms, logic, and variables with values. Obviously all of these are accomplished in ways logically equivalent to a circuit, but the conceptual model changed.
Hardware, software, and.. deepware?
In a comment, Abram noted that a hardware enthusiast could argue against making a software/hardware distinction. The idea of “software” is misleading because it distracts from the physical reality. Even software is still present physically as magnetic states in the computer’s hard drive, or in the circuits. And obviously, software doesn’t do anything hardware can’t do, since software doing something is just hardware doing it. This could be considered different than previous distinctions between levels; a digital calculator is doing something an electric device can’t, while an electric kettle is just doing what another machine does by using electricity instead of some chemical fuel.
But Abram pointed out that thinking in this way will not be a very good way of predicting reality. The hypothetical hardware enthusiast would not be able to predict the rise of the “programmer” profession, or the great increase in complexity of things that machines can do thanks to “programming”.
The argument is that machine learning is a shift of comparable importance, such that it makes more sense to categorize generative AI models as “something else” in much the same way that software is not categorized as hardware (even though it is made of physical stuff).
It is more helpful to think of modern AI as a paradigm shift in the same way that the shift from “electronic” (hardware) to “digital” (software) was a paradigm shift. In other words: the digital age has led to the rise of generative AI, in much the same way that the electric age enabled the rise of electronics. One age doesn’t end, and we’re still using electricity for everything (indeed, for even more things,) but “electric” stopped being the most interesting abstraction. Now, a shift to deep learning and AI means that things like “program”, “code”, “algorithm” are starting to not be the best or most relevant abstraction either.
Is this really different?
When seeing the above explanation, @gjm commented that “I suppose you could say a complicated Excel spreadsheet monstrosity is ‘software’ but it’s quite an unusual kind of software and the things you do to improve or debug it aren’t the same as the ones you do with a conventional program. AI is kinda like these but more so.”
The question is whether “more so” is a evolutionary or revolutionary change. Yes, toasters are different from generators, and the types of things you do to improve or debug them are different, but there is no conceptual shift. You do not need new conceptual tools to understand and debug spreadsheets, even if they are the types of horrendous monstrosities I’ve worked with in finance. On the other hand, there have obviously been smaller paradigm shifts within the larger umbrella of software, from assembly to procedural programming to object oriented programming and so on. And these did involve conceptual shifts and new classes of tools; type systems and type checking were a new concept when shifting from machine code programming in assembly to more abstract programming languages, even though bits, bytes, and words were conceptually distinct in machine code.
It could be debated which shifts should be considered separate paradigms, but the shift to deep learning required a new set of conceptual tools. We need to go back to physics to understand why electronic circuits work, they don’t really work just as analogies to mechanical systems. @gjm explained this clearly; “We’ve got a bunch of things supervening on one another: laws of physics, principles of electronics, digital logic, lower-level software, higher-level software, neural network, currently-poorly-understood structures inside LLMs, something-like-understanding, something-like-meaning. Most of the time, in order to understand a higher-level thing it isn’t very useful to think in terms of the lower-level things.”
This seems to hit on the fundamental difference. When a new paradigm supervenes on a previous one, it doesn’t just add to it, or logically follow. Instead, the old conceptual models fail, and you need new concepts. So type theory is understood via concepts that are coherent in the terms we use to talk about debugging logic in earlier programming. On the other hand, programming instead of circuit design or electronics required a more fundamental regrounding. The new paradigm did not build further on physics and extend mathematical approaches previously used for analog circuit design. Instead, it required the development of new mathematical formalisms and approaches—finite state machines and Turing completeness for programs, first-order predicate logic for databases, and similar. The claim here is that deep learning requires a similar rethinking, not just building conceptual tools on top of those we already have.
What’s in a Name?
Terminology can be illuminating or obscuring, and naming the next step in technological progress is tricky. Electronics is used as a different word than electric, but it’s not as though electrons are more specifically involved; static electricity, resistors, PN junctions, and circuits all involve electrons. Similarly, “software” is not a great name to describe a change from electronic components to data, but both terms stuck. (I clearly recall trying to explain to younger students that they were called floppy disks because the old ones were actually floppy; now, the only thing remaining of that era is the icon that my kids don’t recognize as representing a physical object.)
Currently, we seem to have moved from calling these new methods and tools “machine learning” to calling them “AI,” and both indicate something about how this isn’t software, it’s something different, but neither term really captures the current transition. The product created by machine learning isn’t that the machine was learning, it’s that the derived model can do certain things on the basis of what it infers from data. Many of those things are (better or different versions of) normal types of statistical inference, including categorization, but not all of them. And calling ML statistics misses the emergent capabilities of GANs, LLMs, Diffusion models, and similar.
On the other hand, current “AI” is rightly considered neither artificial nor intelligent. It’s not completely artificial, because other than a few places like self-play training for GANs, it’s trained on human expertise and data. In that way, it’s more clearly machine learning (via imitating humans.) It’s also not currently intelligent in many senses, because it’s non-agentic and very likely not conscious. And in either case, it’s definitely not what people envisioned decades ago when they spoke about “AI.”
The other critical thing that the current terms seem to miss is the deep and inscrutable nature of the systems. There are individuals who understand at least large sections of every large software project, which is necessary for development, but the same is not and never need be true for deep learning models. Even to the extent that interpretability or explainability is successful, the systems are far more complex than humans can fully understand. I think that “deepware” captures some of this, and am indebted to @Oliver Sourbut for the suggestion.
Conclusion
Deep learning models use electricity and run on computers that can be switched on and off, but are not best thought of as electric brains. Deep learning models run on hardware, but are not best thought of as electronic brains. Deep learning models are executed as software instructions, but are not best thought of as software brains. And in the same way, software built on top of these deep learning models to create ”AI systems” provides a programming-interface for the models. But these are not designed systems, they are inscrutable models grown on vast datasets.
It is tempting to think of the amalgam of a deep learning model and the software as a software product. Yes, it is accessed via API, run by software, on hardware, with electricity, but instead of thinking of software, hardware, or electrical systems, we need to see them as what they are. That doesn’t necessarily mean the best way of thinking about them is as inscrutable piles of linear algebra, or as shoggoths, or as artificial intelligence, but it does mean seeing them as something different, and not getting trapped in the wrong paradigm.
Thanks to @gjm for the initial comment and the resulting discussion, and to @zoop for his disagreements, an to both for their feedback on an earlier draft. Thanks to @Gerald Monroe and @noggin-scratcher for pushback and conversation on the original post. Finally, thanks to @Daniel Kokotajlo for initially suggesting “deepnets” and again thanks to @Oliver Sourbut for suggesting “deepware”