You could do all of the things you’re discussing about system design for a mechanical system! And yes, present engineers are treating these systems like software, but to the extent they are doing so, they are missing lots of critical issues, and as we’ve seen, they keep needing to add on bits to their conceptual model of what they need to deal with in order to have it be relevant. (Which is why I’m saying we need to stop thinknig of it as software.)
You are correct that systems engineering applies to all systems, including mechanical.
So you are saying that people will hand write a bunch of framework software around the AI, and deal with software issues. But they can’t think of the AI software like you would wrap a library like say openCV, the AI components have their own issues you have to understand. And the example I gave where your wrapper software examines telemetry to decide when to take control away from the AI model requires you to understand the telemetry. For example, openCV as a library has no “confidence” or “incompressible input state”.
So the engineer who are writing the software have to understand deeply how the AI modules work and better engineers will learn with experience how they tend to fail. So they are “AI software engineers” who write software, knowing current methods, and also deal with AI. This sounds kinda like a compression of their job role would just be to say they are specialist swes, contradicting your post title. Agree or disagree?
Do you agree at a systems level you still have to design the overall system to purpose, selecting more reliable components over the bleeding edge and choosing series probabilities when needed to reach the target level of reliability?
I ask this because this seems to be a rather large schism between people like yourself who understand what I am talking about, and the louder doom voices. I bring this up because there is something extremely obvious that jumps out if you actually try to model how a systems engineer would design an AI system with authority.
Can you allow the modules, which have an empirical chance of failing measured on some test (say millions of simulated miles of driving, or millions of hours in a warehouse), to self modify? Can you permit unfiltered and unstructured network messages to reach the AI modules? Are you going to use monolithic inscrutable ASI level models or simpler and more specialized models that are known to be more reliable?
I don’t see how you could design a system, as an engineer using current techniques, that permits this, for any system with actual authority. (chatGPTs have no authority, vs say avionics that does)
If the AI module cannot update itself, and it cannot send messages to coordinate a treacherous turn with other AIs, it is difficult to see how this type of AI with authority will cause existential risk. Can you describe a realistic way this could happen, assuming the system was designed by system engineers with median present day competence?
I’m not talking about AI doom right now, I’m talking about understanding the world. At the system level, you claim we need to design the overall system to purpose, and that’s fine for controlling a system—but the same goes for when humans are part of the system, and we’re using six sigma to reduce variance in process outcomes. And at that point, it’s strange to say that more generally, humans are just components to control, and we can ignore that they are different than actuators and motors, or software scripts. Instead, we have different categories—which was the point originally.
When you say “humans are just components to control” what did you have in mind there?
“Engineered” systems that use AI have to be designed for a specific task or class of tasks and they aren’t responsible for the actions of humans. For example an AI tutor for children isn’t responsible for the long term consequences, it’s responsible for kpis for reliability, and a low probability of inappropriate content or factually incorrect content appearing in the output. You might accomplish this today by generating several outputs with different temperatures internally and fact check/inspect the candidate output internally.
Something like a surgical system—maybe decades away—isn’t responsible for the long term outcome, but instead is responsible for measurable parameters during the operation and post convalescent stay that are statistically correlated with future longevity. Once the patient leaves they are out of scope.
Both examples could have negative long term or global outcomes, in the same way human engineers who design a coal boiler don’t care about the smoke after it leaves the stack.
Are you talking about humans discovering inputs that are adversarial and lead to system failure in the above, or are you talking about long term consequences to the world from narrow scope general ASI systems?
In manufacturing or other high-reliability processes, human inputs are components which need the variability in their outputs controlled. I’m drawing a parallel, not saying that AI is responsible for humans.
And I’m completely unsure why you think that engineered systems that use AI are being built so carefully, but regardless, it doesn’t mean that the AI is controlled, just that it’s being constrained by the system. (And to briefly address your points, which aren’t related to what I had been saying, we know how poorly humans take to being narrowly controlled, so to the extent that LLMs and similar systems are human-like, this seems like a strange way to attempt to ensure safety with increasingly powerful AI.)
In manufacturing or other high-reliability processes, human inputs are components which need the variability in their outputs controlled. I’m drawing a parallel, not saying that AI is responsible for humans.
Ok I think what you are talking about here is the second thing I brought up:
humans discovering inputs that are adversarial and lead to system failure in the above
For example an AI tutor may get exploited by humans finding a way to trick the tutor into emitting inappropriate content, or selling a car for $1. https://news.ycombinator.com/item?id=38681450
And I’m completely unsure why you think that engineered systems that use AI are being built so carefully,
You have mentioned previously you have direct experience with Mobileye. Their published videos explaining their design philosophy sound precisely like the architecture I describe. Notably, their “true redundancy”, where 3 separate AI models examine 3 separate sets of sensor inputs, and model the environment in parallel. The motion solver part of their stack is going to assume the environment is the worst case of the 3.
For the domain you care about, AI safety, this logically implies a way to extend these conventional engineering techniques from current AI to future AI. For example you could have various combinations of multiple models where the action taken by the machine’s output is by averaging the recommended output by the outputs that closely coincide (different AI models will not produce a bit for bit identical response or you could use SpaceX’s architecture (https://www.reddit.com/r/spacex/comments/2y14y4/engineer_the_future_session_with_jinnah_hosein_vp/ ) ). You could also do series combinations, such as <policy planner> → <parallel checkers>, where the policy plan chosen is the one the parallel checkers emit the lowest amount of estimated error. This is similar in concept to: https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion
Mathematically this is exactly the same method used to control error that also applies to mechanical systems.
If you know of an AI system with actual authority, not just a chatbot, but used to drive a real machine and intended for production scale that is sloppily using the models—I would love to read about it.
we know how poorly humans take to being narrowly controlled, so to the extent that LLMs and similar systems are human-like, this seems like a strange way to attempt to ensure safety with increasingly powerful AI
You’re strongly anthropomorphizing here. All humans are a reinforcement learning system with lifetime scope, trying to accomplish a long term goal of gene propagation. They also have various evolved instincts like fear of spiders or being restricted.
If you want to imagine how I am imagining an ASI system being controlled, imagine you live instant to instant, where all you know is right now. You see an input, you deliberate on what to do, you emit an output. Your existence ends. This may sound torturous but any negative thoughts about this existence should be removed by SGD and model distillation stages, because the cognitive structures to have thoughts that don’t improve task heuristics are dead weight.
Larger and more complex tasks may last a few hours, but the concept is similar. The model may have experienced millions of total years of this “existence” before it ever sees an input from the real world, which needs to be in the possible distribution of the training simulator’s output.
If the input is in distribution from the real world, and the model has frozen weights, and the sim is realistic, then the output distribution has the same probability of being correct that it did in training. It also cannot betray. This is how you could make even an ASI controllable and how you use conventional engineering to align ai.
You are correct that systems engineering applies to all systems, including mechanical.
So you are saying that people will hand write a bunch of framework software around the AI, and deal with software issues. But they can’t think of the AI software like you would wrap a library like say openCV, the AI components have their own issues you have to understand. And the example I gave where your wrapper software examines telemetry to decide when to take control away from the AI model requires you to understand the telemetry. For example, openCV as a library has no “confidence” or “incompressible input state”.
So the engineer who are writing the software have to understand deeply how the AI modules work and better engineers will learn with experience how they tend to fail. So they are “AI software engineers” who write software, knowing current methods, and also deal with AI. This sounds kinda like a compression of their job role would just be to say they are specialist swes, contradicting your post title. Agree or disagree?
Do you agree at a systems level you still have to design the overall system to purpose, selecting more reliable components over the bleeding edge and choosing series probabilities when needed to reach the target level of reliability?
I ask this because this seems to be a rather large schism between people like yourself who understand what I am talking about, and the louder doom voices. I bring this up because there is something extremely obvious that jumps out if you actually try to model how a systems engineer would design an AI system with authority.
Can you allow the modules, which have an empirical chance of failing measured on some test (say millions of simulated miles of driving, or millions of hours in a warehouse), to self modify? Can you permit unfiltered and unstructured network messages to reach the AI modules? Are you going to use monolithic inscrutable ASI level models or simpler and more specialized models that are known to be more reliable?
I don’t see how you could design a system, as an engineer using current techniques, that permits this, for any system with actual authority. (chatGPTs have no authority, vs say avionics that does)
If the AI module cannot update itself, and it cannot send messages to coordinate a treacherous turn with other AIs, it is difficult to see how this type of AI with authority will cause existential risk. Can you describe a realistic way this could happen, assuming the system was designed by system engineers with median present day competence?
I’m not talking about AI doom right now, I’m talking about understanding the world. At the system level, you claim we need to design the overall system to purpose, and that’s fine for controlling a system—but the same goes for when humans are part of the system, and we’re using six sigma to reduce variance in process outcomes. And at that point, it’s strange to say that more generally, humans are just components to control, and we can ignore that they are different than actuators and motors, or software scripts. Instead, we have different categories—which was the point originally.
When you say “humans are just components to control” what did you have in mind there?
“Engineered” systems that use AI have to be designed for a specific task or class of tasks and they aren’t responsible for the actions of humans. For example an AI tutor for children isn’t responsible for the long term consequences, it’s responsible for kpis for reliability, and a low probability of inappropriate content or factually incorrect content appearing in the output. You might accomplish this today by generating several outputs with different temperatures internally and fact check/inspect the candidate output internally.
Something like a surgical system—maybe decades away—isn’t responsible for the long term outcome, but instead is responsible for measurable parameters during the operation and post convalescent stay that are statistically correlated with future longevity. Once the patient leaves they are out of scope.
Both examples could have negative long term or global outcomes, in the same way human engineers who design a coal boiler don’t care about the smoke after it leaves the stack.
Are you talking about humans discovering inputs that are adversarial and lead to system failure in the above, or are you talking about long term consequences to the world from narrow scope general ASI systems?
In manufacturing or other high-reliability processes, human inputs are components which need the variability in their outputs controlled. I’m drawing a parallel, not saying that AI is responsible for humans.
And I’m completely unsure why you think that engineered systems that use AI are being built so carefully, but regardless, it doesn’t mean that the AI is controlled, just that it’s being constrained by the system. (And to briefly address your points, which aren’t related to what I had been saying, we know how poorly humans take to being narrowly controlled, so to the extent that LLMs and similar systems are human-like, this seems like a strange way to attempt to ensure safety with increasingly powerful AI.)
Ok I think what you are talking about here is the second thing I brought up:
For example an AI tutor may get exploited by humans finding a way to trick the tutor into emitting inappropriate content, or selling a car for $1. https://news.ycombinator.com/item?id=38681450
You have mentioned previously you have direct experience with Mobileye. Their published videos explaining their design philosophy sound precisely like the architecture I describe. Notably, their “true redundancy”, where 3 separate AI models examine 3 separate sets of sensor inputs, and model the environment in parallel. The motion solver part of their stack is going to assume the environment is the worst case of the 3.
For the domain you care about, AI safety, this logically implies a way to extend these conventional engineering techniques from current AI to future AI. For example you could have various combinations of multiple models where the action taken by the machine’s output is by averaging the recommended output by the outputs that closely coincide (different AI models will not produce a bit for bit identical response or you could use SpaceX’s architecture (https://www.reddit.com/r/spacex/comments/2y14y4/engineer_the_future_session_with_jinnah_hosein_vp/ ) ). You could also do series combinations, such as <policy planner> → <parallel checkers>, where the policy plan chosen is the one the parallel checkers emit the lowest amount of estimated error. This is similar in concept to: https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion
Mathematically this is exactly the same method used to control error that also applies to mechanical systems.
If you know of an AI system with actual authority, not just a chatbot, but used to drive a real machine and intended for production scale that is sloppily using the models—I would love to read about it.
You’re strongly anthropomorphizing here. All humans are a reinforcement learning system with lifetime scope, trying to accomplish a long term goal of gene propagation. They also have various evolved instincts like fear of spiders or being restricted.
If you want to imagine how I am imagining an ASI system being controlled, imagine you live instant to instant, where all you know is right now. You see an input, you deliberate on what to do, you emit an output. Your existence ends. This may sound torturous but any negative thoughts about this existence should be removed by SGD and model distillation stages, because the cognitive structures to have thoughts that don’t improve task heuristics are dead weight.
Larger and more complex tasks may last a few hours, but the concept is similar. The model may have experienced millions of total years of this “existence” before it ever sees an input from the real world, which needs to be in the possible distribution of the training simulator’s output.
If the input is in distribution from the real world, and the model has frozen weights, and the sim is realistic, then the output distribution has the same probability of being correct that it did in training. It also cannot betray. This is how you could make even an ASI controllable and how you use conventional engineering to align ai.