I almost missed that there’s new thoughts here, I thought this was a rehash of your previous post The AI Apocalypse Myth!
The new bit sounds similar to Elon Musk’s curious AI plan. I think this means it has a similar problem: humans are complex and a bounty of data to learn about, but as the adage goes, “all happy families are alike; each unhappy family is unhappy in its own way.” A curious/learning-first AI might make many discoveries about happy humans while it is building up power, and then start putting humans in a greater number of awful but novel and “interesting” situations once it doesn’t need humanity to survive.
That said, this is only a problem if the AI is likely to not be empathetic/compassionate, which if I’m not mistaken, is one of the main things we would disagree on. I think that instead of trying to find these technical workarounds, you should argue for the much more interesting (and important!) position that AIs are likely to be empathetic and compassionate by default.
If instead you do want to be more persuasive with these workarounds, can I suggest adopting more of a security mindset? You appear to be looking for ways in which things can possibly go right, instead of all the ways things can go wrong. Alternatively, you don’t appear to be modeling the doomer mindset very well, so you can’t “put on your doomer hat” and check whether doomers would see your proposal as persuasive. Understanding a different viewpoint in depth is a big ask, but I think you’d find more success that way.
Hello kgldeshapriya, welcome to LessWrong!
At first I thought that the OTP chips would be locked to a single program, which would make it infeasible since programs need to be updated regularly, but it sounds like the OTP chip is either on the control plane above the CPU/GPU, or physically passes CPU signals through it, so it can either kill power to the motherboard, or completely sever CPU processing. I’ll assume one of these schemes is how you’d use the OTP chips.
I agree with JBlack that LW probably already has details on why this wouldn’t work, but I’ll freehand some problems below:
First, physical details: how does the OTP get the kill signal? Maybe we set aside some electromagnetic (EM) spectrum and have a wireless antenna attached directly to the chip (mandating that all robot shells use EM transparent materials and ruling out many metals, which the military won’t like), and building transmitters to blanket the earth.
Alternatively, if the robots ever cease receiving a signal, they shut off (which is annoying for use in RF dead zones, but maybe those applications are specially sanctioned and tightly controlled). This could prevent 1st order robot countermoves, so bombing the signal infrastructure will instead shut down the machines, but as covered later has its own weaknesses.
There is a weak link in the shutdown command chain, since the chip companies control the kill codes. The chip companies likely would need to strike a balance: more code holders would be robust to a coordinated robot attack taking them all out at once, but opens up the chips to mundane attacks by terrorists and hostile countries, with more copies of the keys floating around leading to more leaks. Fewer code holders helps traditional security, but opens up the country/chip fab to losing access to all their kill codes at once in a series of freak accidents, or all of them being social engineered at the same time. Ideally the chip companies strike the right balance, but this simply makes a coordinated attack harder to pull off, not impossible.
Robots could attack the signal infrastructure directly. Assuming signal propagation is highly redundant (sea cables, satellite, even ad-hoc mesh networks), you could still attack any points that trigger the signal (for example, the signal is triggered by something like the nuclear football). There are similar tradeoffs to before, where having many trigger points would open up the network to “conventional” attack, but few points would make the network more brittle.
If the kill signal is the absence of a signal instead the presence of one, they could set up their own “live” signal in parallel to the existing network, so stopping the authorized “live” signal does not shut down the robots. Obviously you could use a cryptographic scheme to verify the signal is signed by some private key, but this simply makes an attack harder (perhaps including a rubber hose attack), and not impossible.
Robots could stop the signal at the last mile: for example, if the antenna is external, robots could simply put “tin foil” on top of it (obviously the solution is to reserve spectrum that is not attenuated by tin foil, but strengthened). If it’s internal like in our scenario, the same signal blocking requires the robot to either be able to access its innards or cooperate, but in the long run robots will need to be able to fix their innards anyways, so they (presumably) wouldn’t start the revolution without this capability. Since it’s inside a chassis, robots could prepare for months by making small adjustments to block kill signals without anyone noticing.
What if we combined our different implementations? Have a kill code channel at 42MHz, and have the “live” code channel at 44MHz. This rules out simple tin foil attacks, since the “live” channel needs to continue to receive a signal, but this just rules out simpler attack vectors. For example, the robots could stick a radio inside the tin foil that broadcasts a fake “live” signal.
I’ll skip scenarios where robots are able to act much faster than humans, which I think are self-explanatory.
(I missed that owners could act as alternate code holders in my first reading.) Assuming that the owner OR the manufacturer could shut down a robot, if there is a concerted uprising we can count the consumers out, who have trouble keeping track of their own bank passwords. If the military is the owner, they will have similar problems to the manufacturer in keeping the command chain secure (on one hand, as far as I know the US military has kept the nuclear codes secret; on the other hand, the nuclear codes were likely 00000000 until 1977).
In summary, I think blowing the programming fuses on a control chip helps raise the bar for successful attacks a bit, but does not secure the robotics control system to the point that we can consider any AI advances “safe”.