The assumption here is that we can implement a system that deterministically computes mathematical functions with no side effects. We can get much closer for this than we can for leaking information outward, but we still fail to do this perfectly. There are real-world exploits that have broken this abstraction to cause undesired behaviour, and which could be used to gather information about some properties of the real world.
For example, cosmic rays cause bit errors and we can already write software that observes them with high probability of not crashing. We can harden our hardware such as by adding error correction, but this reduces the error rate without eliminating it. There are also CPU bugs, heat-related errors, and DRAM charge levels that have been exploited in real applications, and no doubt many other potential vectors that we haven’t discovered yet.
It would certainly be extremely difficult to discern much about the real world purely via such channels, but it only takes one exploitable bug that opens a completely unintended inward channel for the game to be over.
I agree that, even for side-channels exposing external information to a mathematical attacker, we cannot get this absolutely perfect. Error-correction in microelectronics is an engineering problem and engineering is never absolutely fault-free.
However, per this recent US government study, RAM error rates in high-performance compute clusters range from 0.2 to 20 faults per billion device-hours. For comparison, training GPT-3 (175B parameters) from scratch takes roughly 1-3 million device-hours. An attacker inside a deep learning training run probably gets zero bits of information via the RAM-error channel.
But suppose they get a few bits. Those bits are about as random as they come. Nor is there anything clever to do from within an algorithm to amplify the extent to which cosmic rays reflect useful information about life on Earth.
I disbelieve that your claims about real-world exploits, if cashed out, would break the abstraction of deterministic execution such as is implemented in practice for blockchain smart contracts.
I do think it’s prudent to use strong hardware and software error-correction techniques in high-stakes situations, such as advanced AI, but mostly because errors are generally bad, for reliability and ability to reason about systems and their behaviours. The absolute worst would be if the sign bit got flipped somewhere in a mesa-optimiser’s utility function. So I’m not saying we can just completely neglect concerns about cosmic rays in an AI safety context. But I am prepared to bet the farm on task-specific AIs being completely unable to learn any virology via side-channels if the AI lab training it musters a decent effort to be careful about deterministic execution (which, I stress again, is not something I think happens by default—I hope this post has some causal influence towards making it more likely).
The assumption here is that we can implement a system that deterministically computes mathematical functions with no side effects. We can get much closer for this than we can for leaking information outward, but we still fail to do this perfectly. There are real-world exploits that have broken this abstraction to cause undesired behaviour, and which could be used to gather information about some properties of the real world.
For example, cosmic rays cause bit errors and we can already write software that observes them with high probability of not crashing. We can harden our hardware such as by adding error correction, but this reduces the error rate without eliminating it. There are also CPU bugs, heat-related errors, and DRAM charge levels that have been exploited in real applications, and no doubt many other potential vectors that we haven’t discovered yet.
It would certainly be extremely difficult to discern much about the real world purely via such channels, but it only takes one exploitable bug that opens a completely unintended inward channel for the game to be over.
I agree that, even for side-channels exposing external information to a mathematical attacker, we cannot get this absolutely perfect. Error-correction in microelectronics is an engineering problem and engineering is never absolutely fault-free.
However, per this recent US government study, RAM error rates in high-performance compute clusters range from 0.2 to 20 faults per billion device-hours. For comparison, training GPT-3 (175B parameters) from scratch takes roughly 1-3 million device-hours. An attacker inside a deep learning training run probably gets zero bits of information via the RAM-error channel.
But suppose they get a few bits. Those bits are about as random as they come. Nor is there anything clever to do from within an algorithm to amplify the extent to which cosmic rays reflect useful information about life on Earth.
I disbelieve that your claims about real-world exploits, if cashed out, would break the abstraction of deterministic execution such as is implemented in practice for blockchain smart contracts.
I do think it’s prudent to use strong hardware and software error-correction techniques in high-stakes situations, such as advanced AI, but mostly because errors are generally bad, for reliability and ability to reason about systems and their behaviours. The absolute worst would be if the sign bit got flipped somewhere in a mesa-optimiser’s utility function. So I’m not saying we can just completely neglect concerns about cosmic rays in an AI safety context. But I am prepared to bet the farm on task-specific AIs being completely unable to learn any virology via side-channels if the AI lab training it musters a decent effort to be careful about deterministic execution (which, I stress again, is not something I think happens by default—I hope this post has some causal influence towards making it more likely).
Yeah, it reduces the probability from massive problem to Pascal’s mugging probabilities. (Or it reduces it below the noise floor.)
Arguably, they reduce the probability to literally 0, i.e it is impossible for AI to break out of the box.