When you say “humans are just components to control” what did you have in mind there?
“Engineered” systems that use AI have to be designed for a specific task or class of tasks and they aren’t responsible for the actions of humans. For example an AI tutor for children isn’t responsible for the long term consequences, it’s responsible for kpis for reliability, and a low probability of inappropriate content or factually incorrect content appearing in the output. You might accomplish this today by generating several outputs with different temperatures internally and fact check/inspect the candidate output internally.
Something like a surgical system—maybe decades away—isn’t responsible for the long term outcome, but instead is responsible for measurable parameters during the operation and post convalescent stay that are statistically correlated with future longevity. Once the patient leaves they are out of scope.
Both examples could have negative long term or global outcomes, in the same way human engineers who design a coal boiler don’t care about the smoke after it leaves the stack.
Are you talking about humans discovering inputs that are adversarial and lead to system failure in the above, or are you talking about long term consequences to the world from narrow scope general ASI systems?
In manufacturing or other high-reliability processes, human inputs are components which need the variability in their outputs controlled. I’m drawing a parallel, not saying that AI is responsible for humans.
And I’m completely unsure why you think that engineered systems that use AI are being built so carefully, but regardless, it doesn’t mean that the AI is controlled, just that it’s being constrained by the system. (And to briefly address your points, which aren’t related to what I had been saying, we know how poorly humans take to being narrowly controlled, so to the extent that LLMs and similar systems are human-like, this seems like a strange way to attempt to ensure safety with increasingly powerful AI.)
In manufacturing or other high-reliability processes, human inputs are components which need the variability in their outputs controlled. I’m drawing a parallel, not saying that AI is responsible for humans.
Ok I think what you are talking about here is the second thing I brought up:
humans discovering inputs that are adversarial and lead to system failure in the above
For example an AI tutor may get exploited by humans finding a way to trick the tutor into emitting inappropriate content, or selling a car for $1. https://news.ycombinator.com/item?id=38681450
And I’m completely unsure why you think that engineered systems that use AI are being built so carefully,
You have mentioned previously you have direct experience with Mobileye. Their published videos explaining their design philosophy sound precisely like the architecture I describe. Notably, their “true redundancy”, where 3 separate AI models examine 3 separate sets of sensor inputs, and model the environment in parallel. The motion solver part of their stack is going to assume the environment is the worst case of the 3.
For the domain you care about, AI safety, this logically implies a way to extend these conventional engineering techniques from current AI to future AI. For example you could have various combinations of multiple models where the action taken by the machine’s output is by averaging the recommended output by the outputs that closely coincide (different AI models will not produce a bit for bit identical response or you could use SpaceX’s architecture (https://www.reddit.com/r/spacex/comments/2y14y4/engineer_the_future_session_with_jinnah_hosein_vp/ ) ). You could also do series combinations, such as <policy planner> → <parallel checkers>, where the policy plan chosen is the one the parallel checkers emit the lowest amount of estimated error. This is similar in concept to: https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion
Mathematically this is exactly the same method used to control error that also applies to mechanical systems.
If you know of an AI system with actual authority, not just a chatbot, but used to drive a real machine and intended for production scale that is sloppily using the models—I would love to read about it.
we know how poorly humans take to being narrowly controlled, so to the extent that LLMs and similar systems are human-like, this seems like a strange way to attempt to ensure safety with increasingly powerful AI
You’re strongly anthropomorphizing here. All humans are a reinforcement learning system with lifetime scope, trying to accomplish a long term goal of gene propagation. They also have various evolved instincts like fear of spiders or being restricted.
If you want to imagine how I am imagining an ASI system being controlled, imagine you live instant to instant, where all you know is right now. You see an input, you deliberate on what to do, you emit an output. Your existence ends. This may sound torturous but any negative thoughts about this existence should be removed by SGD and model distillation stages, because the cognitive structures to have thoughts that don’t improve task heuristics are dead weight.
Larger and more complex tasks may last a few hours, but the concept is similar. The model may have experienced millions of total years of this “existence” before it ever sees an input from the real world, which needs to be in the possible distribution of the training simulator’s output.
If the input is in distribution from the real world, and the model has frozen weights, and the sim is realistic, then the output distribution has the same probability of being correct that it did in training. It also cannot betray. This is how you could make even an ASI controllable and how you use conventional engineering to align ai.
When you say “humans are just components to control” what did you have in mind there?
“Engineered” systems that use AI have to be designed for a specific task or class of tasks and they aren’t responsible for the actions of humans. For example an AI tutor for children isn’t responsible for the long term consequences, it’s responsible for kpis for reliability, and a low probability of inappropriate content or factually incorrect content appearing in the output. You might accomplish this today by generating several outputs with different temperatures internally and fact check/inspect the candidate output internally.
Something like a surgical system—maybe decades away—isn’t responsible for the long term outcome, but instead is responsible for measurable parameters during the operation and post convalescent stay that are statistically correlated with future longevity. Once the patient leaves they are out of scope.
Both examples could have negative long term or global outcomes, in the same way human engineers who design a coal boiler don’t care about the smoke after it leaves the stack.
Are you talking about humans discovering inputs that are adversarial and lead to system failure in the above, or are you talking about long term consequences to the world from narrow scope general ASI systems?
In manufacturing or other high-reliability processes, human inputs are components which need the variability in their outputs controlled. I’m drawing a parallel, not saying that AI is responsible for humans.
And I’m completely unsure why you think that engineered systems that use AI are being built so carefully, but regardless, it doesn’t mean that the AI is controlled, just that it’s being constrained by the system. (And to briefly address your points, which aren’t related to what I had been saying, we know how poorly humans take to being narrowly controlled, so to the extent that LLMs and similar systems are human-like, this seems like a strange way to attempt to ensure safety with increasingly powerful AI.)
Ok I think what you are talking about here is the second thing I brought up:
For example an AI tutor may get exploited by humans finding a way to trick the tutor into emitting inappropriate content, or selling a car for $1. https://news.ycombinator.com/item?id=38681450
You have mentioned previously you have direct experience with Mobileye. Their published videos explaining their design philosophy sound precisely like the architecture I describe. Notably, their “true redundancy”, where 3 separate AI models examine 3 separate sets of sensor inputs, and model the environment in parallel. The motion solver part of their stack is going to assume the environment is the worst case of the 3.
For the domain you care about, AI safety, this logically implies a way to extend these conventional engineering techniques from current AI to future AI. For example you could have various combinations of multiple models where the action taken by the machine’s output is by averaging the recommended output by the outputs that closely coincide (different AI models will not produce a bit for bit identical response or you could use SpaceX’s architecture (https://www.reddit.com/r/spacex/comments/2y14y4/engineer_the_future_session_with_jinnah_hosein_vp/ ) ). You could also do series combinations, such as <policy planner> → <parallel checkers>, where the policy plan chosen is the one the parallel checkers emit the lowest amount of estimated error. This is similar in concept to: https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion
Mathematically this is exactly the same method used to control error that also applies to mechanical systems.
If you know of an AI system with actual authority, not just a chatbot, but used to drive a real machine and intended for production scale that is sloppily using the models—I would love to read about it.
You’re strongly anthropomorphizing here. All humans are a reinforcement learning system with lifetime scope, trying to accomplish a long term goal of gene propagation. They also have various evolved instincts like fear of spiders or being restricted.
If you want to imagine how I am imagining an ASI system being controlled, imagine you live instant to instant, where all you know is right now. You see an input, you deliberate on what to do, you emit an output. Your existence ends. This may sound torturous but any negative thoughts about this existence should be removed by SGD and model distillation stages, because the cognitive structures to have thoughts that don’t improve task heuristics are dead weight.
Larger and more complex tasks may last a few hours, but the concept is similar. The model may have experienced millions of total years of this “existence” before it ever sees an input from the real world, which needs to be in the possible distribution of the training simulator’s output.
If the input is in distribution from the real world, and the model has frozen weights, and the sim is realistic, then the output distribution has the same probability of being correct that it did in training. It also cannot betray. This is how you could make even an ASI controllable and how you use conventional engineering to align ai.