What I expect to change quickly is that “programming languages” will go away completely. LLMs or similar tech will get us way closer to the DWIM level. Directly translating from a spec to executables will be something AI can excel at. The missing piece is the feedback: writing and executing unit tests and changing the executable (not the code!) to pass the tests.
Note that a lot of current CS concepts are human-oriented and make no sense when the human is not a part of the process: “architecture” is a crutch for the limitation of our brains. “Design” is another crutch. This can all be streamlined into “Spec->Binary”.
Even further, there is no reason to have a general-purpose computer when it is easy for an AI to convert a spec into actual hardware, such as FPGA.
Next on the list (or maybe even first on the list) is not needing the low-level executables at all: the LLM or equivalent just does what you ask of it.
Architecture is not JUST for brain limitations. Dividing a large task into separate testable (and formally provable now that we have AI to do the immense labor this takes) modules interconnected by message passing through shared memory with an eye towards performance is architecture.
It’s not simple either, for example performance requires someone or something to have a flow graph in their head or represented somewhere to know where the bottlenecks are.
I agree with you on the idea of AI bytecode authors: once you have a program into a representation tight enough that one and only one binary truth table can be constructed to model the behavior of the program (a property all programming languages have while English doesn’t), a second AI could just write the bytecode or fpga logic in an optimized form.
No need for compilers, the language could be python or higher where all languages below that are pointless.
Architecture is not JUST for brain limitations. Dividing a large task into separate testable (and formally provable now that we have AI to do the immense labor this takes) modules interconnected by message passing through shared memory with an eye towards performance is architecture.
I would think that this part is one of the easier ones to automate. If you don’t see it that way, what do you feel the impediments that require human input would be?
Edit:
once you have a program into a representation tight enough that one and only one binary truth table can be constructed to model the behavior of the program
Note that this is not quite the case already: different compilers or different versions of the same compiler, or different optimization levels of the same version output different binaries even on the same platform, let alone on x86 vs ARM or something! There are best-effort promises of “one binary truth table” but it is never a guarantee.
This means the top level of the program (what it will output given binary input I) produces the same behavior, which is true for all languages.
This is definitely not my experience, having worked with C in embedded systems for some decades. Every new port has at least slightly different behavior, which sometimes matters and sometimes does not. Some of the things that tend to break: timing, especially in a multitasking system, creating or exacerbating race conditions; values in memory left uninitialized (probably not as much of an issue for more modern languages, though I have seen this in Java as well); garbage collection (say, in JS when switching browsers); implementations of markup (breaks all the time for no reason).
Output = f(I) can be represented by a truth table with a row for all permutations of I.
I have a substantial amount of embedded systems experience. More than 10 years.
Note that what you are describing is almost always in fact faulty system behavior which is exactly the reason you need better architectures. Many systems shipping today have faulty behavior, it passes acceptance tests but is undefined for the reasons you mention.
Determinism and timing guarantees is almost always correct software behavior. (RNGs being isolated exceptions)
We could not fix the architecture for many systems for the obvious reasons, the cost in labor to rewrite it. Which disappears if you can ask an AI to do it in a series of well defined steps.
Just to specify my claim a little harder, I am saying that individual threads/ISRs/processes/microservices: every actor has an initial input state, I.
And then the truth table rule above applies.
If a thread hits a mutex, and then either waits or proceeds, that mutex state is part of the truth table row.
Depending on when it got the lock and got to read a variable, the variable it actually read is still part of the same row.
Any internal variables it has are part of the row.
Ultimately for each actor, it is still a table lookup in practice.
Now as I mentioned earlier, this is generally bad design. Pure functional programming, which at all levels of embedded systems up to hyperscaler systems, has become the modern standard, and even low level embedded systems should be pure functional. This means they might hold a reader and writer lock on the system state they are modifying, for example, or other methods so that the entire state they operate on is atomic and coherent.
For example I’ve written a 10 khz motor controller, where it is a functional system of
PWM_outputs = f( phaseA, phaseB, resolver, PID_state, speed_filter[], current_filter[]) and a few other things. My actual implementation wasn’t so clean and the system had bugs.
This above system is atomic, I need all variables to be from the same timestep and if my code loop is too slow to do all the work before the next timestep, I need to release control of the motor (open all the gates) and shut down with an error.
If I had an AI to do work for me I would have asked it to do some fairly massive refactors and add more wrapper layers etc and then self review it’s own code by rubrics to make something clean and human readable.
All things that GPT-4 can do right now, especially if it gets a finetune on coding.
There are architectural problems with LLMs that I think prevent the future you are describing; they can only output so many tokens, and actual binaries are often up to thousands of times the token size of the actual programming languages, especially when compiled from high level languages. The compilation process is thus a useful compression step, and I don’t expect designed-for-LLM programming languages because the terabyte datasets currently necessary to train LLMs to use them won’t be available. In addition, at least for the next few years, it will be important for humans to be able to inspect and reason directly about the software the LLM has made.
I can foresee a near future where people can “program” by passing pseudo-code in plain English to LLMs, but I still think that we are nowhere near the point where programmer truly becomes an unskilled job. “Writing correct algorithms in plain English” is not a common skill, and you can’t have LLMs writing perfect code 100% of the times. At the very least, you will still need a competent human to find bugs in the machine-generated code… my best guess is that the software industry will hire less programmers rather than less competent programmers.
At the very least, you will still need a competent human to find bugs in the machine-generated code
...But why? I understand that LLMs produce code with bugs right now, but, why expect that they will continue to do that to an economically meaningful degree in the future? I don’t expect that.
For the same reason we don’t have self-driving cars yet: you cannot expect those systems to be perfectly reliable 100% of the time (well, actually you can, but I don’t expect such improvements in the near future just from scaling).
“Writing correct algorithms in plain English” is not a common skill
Indeed it is not! But it is the one easier to automate than “create a requirement spec”. Here are plausible automation steps:
Get a feature request from a human, in plain English. (E.g. a mockup of the UI, or even “the app must take one-click payments from the shopping-cart screen”).
AI converts it into a series of screen flows etc. Repeat from step 1 until happy.
AI generates an internal test set. (Optionally reviewed by a human to see if they match the requirements. Go back to step 1 and adjust until happy.)
AI generates a mockup of all external APIs, matching the existing API specs.
AI generates the product (e.g. an app, or an ecosystem, or an FPGA, or whatever else).
AI validates the product against the test set and adjusts the product until the tests are passed.
At the very least, you will still need a competent human to find bugs in the machine-generated code...
I don’t think so, humans are worse at it than AI already. All you need is to close the feedback loop, and what I see online of the GPT-4 demos, giving an error message back to the AI already prompts it to correct the issue. These are of course syntax errors not semantic errors, but that is what the test suite is for, to obsolete the distinction between syntax and semantics, which the current LLMs are already pretty good at.
my best guess is that the software industry will hire less programmers rather than less competent programmers
Yes, and they will not be “programmers”, they will be “AI shepherds” or something.
I suspect that we are thinking about different use cases here.
For very standard things without complicated logic like an e-commerce app or showcase site, I can concede that an automated workflow could work without anyone ever looking at the code. This is (sort of) already possible without LLMs: there are several Full Site Editing apps already for building standard websites without looking at the code.
But suppose that your customer needs a program able to solve a complicated scheduling or routing problem tailored to some specific needs. Maybe our non-programmer knows the theoretical structure of routing problems and can direct the LLM to write the correct algorithms, but in this case it is definitely not an unskilled job (I suspect that <1% of the general population would be able to describe a routing problem in formal terms).
If our non-programmer is actually unskilled and has no clue about routing problems… what are we supposed to do? Throw vague specs at the AI and hope for the best?
If our non-programmer is actually unskilled and has no clue about routing problems… what are we supposed to do? Throw vague specs at the AI and hope for the best?
The person can… ask the AI about routing algorithms and related problems? Already now the bots are pretty good describing the current state of the field. And then come up with a workable approach interactively, before instructing the bot to spawn a specialized router app. That is to say, it will not be an unskilled job, it still requires someone who can learn, understand and make sensible decisions, which is in many ways harder than implementing a given algorithm. They just won’t be doing any “programming” as the term is understood now.
What I expect to change quickly is that “programming languages” will go away completely. LLMs or similar tech will get us way closer to the DWIM level. Directly translating from a spec to executables will be something AI can excel at. The missing piece is the feedback: writing and executing unit tests and changing the executable (not the code!) to pass the tests.
Note that a lot of current CS concepts are human-oriented and make no sense when the human is not a part of the process: “architecture” is a crutch for the limitation of our brains. “Design” is another crutch. This can all be streamlined into “Spec->Binary”.
Even further, there is no reason to have a general-purpose computer when it is easy for an AI to convert a spec into actual hardware, such as FPGA.
Next on the list (or maybe even first on the list) is not needing the low-level executables at all: the LLM or equivalent just does what you ask of it.
Architecture is not JUST for brain limitations. Dividing a large task into separate testable (and formally provable now that we have AI to do the immense labor this takes) modules interconnected by message passing through shared memory with an eye towards performance is architecture.
It’s not simple either, for example performance requires someone or something to have a flow graph in their head or represented somewhere to know where the bottlenecks are.
I agree with you on the idea of AI bytecode authors: once you have a program into a representation tight enough that one and only one binary truth table can be constructed to model the behavior of the program (a property all programming languages have while English doesn’t), a second AI could just write the bytecode or fpga logic in an optimized form.
No need for compilers, the language could be python or higher where all languages below that are pointless.
I would think that this part is one of the easier ones to automate. If you don’t see it that way, what do you feel the impediments that require human input would be?
Edit:
Note that this is not quite the case already: different compilers or different versions of the same compiler, or different optimization levels of the same version output different binaries even on the same platform, let alone on x86 vs ARM or something! There are best-effort promises of “one binary truth table” but it is never a guarantee.
This means the top level of the program (what it will output given binary input I) produces the same behavior, which is true for all languages.
Output = f(I) can be represented by a truth table with a row for all permutations of I.
Implementation details don’t matter.
Multithreading / runtime delays can change the sequence things output but not the possibility space of outputs.
This is definitely not my experience, having worked with C in embedded systems for some decades. Every new port has at least slightly different behavior, which sometimes matters and sometimes does not. Some of the things that tend to break: timing, especially in a multitasking system, creating or exacerbating race conditions; values in memory left uninitialized (probably not as much of an issue for more modern languages, though I have seen this in Java as well); garbage collection (say, in JS when switching browsers); implementations of markup (breaks all the time for no reason).
We must be living in different worlds...
I have a substantial amount of embedded systems experience. More than 10 years.
Note that what you are describing is almost always in fact faulty system behavior which is exactly the reason you need better architectures. Many systems shipping today have faulty behavior, it passes acceptance tests but is undefined for the reasons you mention.
Determinism and timing guarantees is almost always correct software behavior. (RNGs being isolated exceptions)
We could not fix the architecture for many systems for the obvious reasons, the cost in labor to rewrite it. Which disappears if you can ask an AI to do it in a series of well defined steps.
Just to specify my claim a little harder, I am saying that individual threads/ISRs/processes/microservices: every actor has an initial input state, I.
And then the truth table rule above applies.
If a thread hits a mutex, and then either waits or proceeds, that mutex state is part of the truth table row.
Depending on when it got the lock and got to read a variable, the variable it actually read is still part of the same row.
Any internal variables it has are part of the row.
Ultimately for each actor, it is still a table lookup in practice.
Now as I mentioned earlier, this is generally bad design. Pure functional programming, which at all levels of embedded systems up to hyperscaler systems, has become the modern standard, and even low level embedded systems should be pure functional. This means they might hold a reader and writer lock on the system state they are modifying, for example, or other methods so that the entire state they operate on is atomic and coherent.
For example I’ve written a 10 khz motor controller, where it is a functional system of
PWM_outputs = f( phaseA, phaseB, resolver, PID_state, speed_filter[], current_filter[]) and a few other things. My actual implementation wasn’t so clean and the system had bugs.
This above system is atomic, I need all variables to be from the same timestep and if my code loop is too slow to do all the work before the next timestep, I need to release control of the motor (open all the gates) and shut down with an error.
If I had an AI to do work for me I would have asked it to do some fairly massive refactors and add more wrapper layers etc and then self review it’s own code by rubrics to make something clean and human readable.
All things that GPT-4 can do right now, especially if it gets a finetune on coding.
There are architectural problems with LLMs that I think prevent the future you are describing; they can only output so many tokens, and actual binaries are often up to thousands of times the token size of the actual programming languages, especially when compiled from high level languages. The compilation process is thus a useful compression step, and I don’t expect designed-for-LLM programming languages because the terabyte datasets currently necessary to train LLMs to use them won’t be available. In addition, at least for the next few years, it will be important for humans to be able to inspect and reason directly about the software the LLM has made.
But it’s possible these problems get solved soon.
I can foresee a near future where people can “program” by passing pseudo-code in plain English to LLMs, but I still think that we are nowhere near the point where programmer truly becomes an unskilled job. “Writing correct algorithms in plain English” is not a common skill, and you can’t have LLMs writing perfect code 100% of the times. At the very least, you will still need a competent human to find bugs in the machine-generated code… my best guess is that the software industry will hire less programmers rather than less competent programmers.
...But why? I understand that LLMs produce code with bugs right now, but, why expect that they will continue to do that to an economically meaningful degree in the future? I don’t expect that.
For the same reason we don’t have self-driving cars yet: you cannot expect those systems to be perfectly reliable 100% of the time (well, actually you can, but I don’t expect such improvements in the near future just from scaling).
Humans are much better drivers than they are programmers
Indeed it is not! But it is the one easier to automate than “create a requirement spec”. Here are plausible automation steps:
Get a feature request from a human, in plain English. (E.g. a mockup of the UI, or even “the app must take one-click payments from the shopping-cart screen”).
AI converts it into a series of screen flows etc. Repeat from step 1 until happy.
AI generates an internal test set. (Optionally reviewed by a human to see if they match the requirements. Go back to step 1 and adjust until happy.)
AI generates a mockup of all external APIs, matching the existing API specs.
AI generates the product (e.g. an app, or an ecosystem, or an FPGA, or whatever else).
AI validates the product against the test set and adjusts the product until the tests are passed.
I don’t think so, humans are worse at it than AI already. All you need is to close the feedback loop, and what I see online of the GPT-4 demos, giving an error message back to the AI already prompts it to correct the issue. These are of course syntax errors not semantic errors, but that is what the test suite is for, to obsolete the distinction between syntax and semantics, which the current LLMs are already pretty good at.
Yes, and they will not be “programmers”, they will be “AI shepherds” or something.
I suspect that we are thinking about different use cases here.
For very standard things without complicated logic like an e-commerce app or showcase site, I can concede that an automated workflow could work without anyone ever looking at the code. This is (sort of) already possible without LLMs: there are several Full Site Editing apps already for building standard websites without looking at the code.
But suppose that your customer needs a program able to solve a complicated scheduling or routing problem tailored to some specific needs. Maybe our non-programmer knows the theoretical structure of routing problems and can direct the LLM to write the correct algorithms, but in this case it is definitely not an unskilled job (I suspect that <1% of the general population would be able to describe a routing problem in formal terms).
If our non-programmer is actually unskilled and has no clue about routing problems… what are we supposed to do? Throw vague specs at the AI and hope for the best?
The person can… ask the AI about routing algorithms and related problems? Already now the bots are pretty good describing the current state of the field. And then come up with a workable approach interactively, before instructing the bot to spawn a specialized router app. That is to say, it will not be an unskilled job, it still requires someone who can learn, understand and make sensible decisions, which is in many ways harder than implementing a given algorithm. They just won’t be doing any “programming” as the term is understood now.