Steve_Omohundro

Karma: 312

Steve_Omohundro Aug 20, 2024, 9:56 PM
1 point
0
in reply to: Steve_Omohundro’s comment on: Limitations on Formal Verification for AI Safety
While dark energy and dark matter have a big effect on the evolution of the universe as a whole, they don’t interact in any measurable way with systems here on earth. Ethan Siegel has some great posts narrowing down their properties based on what we definitively know, eg. https://bigthink.com/starts-with-a-bang/dark-matter-bullet-cluster/ So it’s important on large scales but not, say, on the scale of earth. Of course, if we consider the evolution of AI and humanity over much longer timescales, then we will likely need a detailed theory. That again shows that we need to work with precise models which may expand their regimes of applicability.

Steve_Omohundro Aug 20, 2024, 9:45 PM
13 points
7
in reply to: Seth Herd’s comment on: Limitations on Formal Verification for AI Safety
I’m focused on making sure our infrastructure is safe against AI attacks. This will require that our software not have security holes, that our cryptography not be vulnerable to mathematical attacks, our hardware not leak signals to adversarial sensors, and our social mechanisms not be vulnerable to manipulative interactions. I believe the only practical way to have these assurances is to model our systems formally using tools like Lean and to design them so that adversaries in a specified class provably cannot break specified safety criteria (eg. not leak cryptographic keys). Humans can do these tasks today but it is laborious and usually only attempted when the stakes are high. We are likely to soon have AI systems which can synthesize systems with verified proofs of safety properties rapidly and cheaply. I believe this will be a game changer for safety and am arguing that we need to prepare for that opportunity. One important piece of infrastructure is the hardware that AI runs on. When we have provable protections for that hardware, we can put hard controls on the compute available to AIs, its ability to replicate, etc..

Steve_Omohundro Aug 20, 2024, 9:13 PM
7 points
2
in reply to: Seth Herd’s comment on: Limitations on Formal Verification for AI Safety
Simulation of the time evolution of models from their dynamical equations is only one way of proving properties about them. For example, a harmonic oscillator https://en.wikipedia.org/wiki/Harmonic_oscillator has dynamical equations m d^2x/dt^2= -kx. You can simulate that but you can also prove that the kinetic plus potential energy is conserved and get limits on its behavior arbitrarily far into the future.

Steve_Omohundro Aug 20, 2024, 9:04 PM
6 points
2
in reply to: habryka’s comment on: Limitations on Formal Verification for AI Safety
It’s very hard to get large gravitational fields. The closest known black hole to Earth is Gaia BH1 which is 1560 light-years away: https://www.space.com/closest-massive-black-hole-earth-hubble The strongest gravitational waves come from the collision of two black holes but by the time they reach Earth they are so weak it takes huge effort to measure them and they are in the weak curvature regime where standard quantum field theory is fine: https://www.ligo.caltech.edu/page/what-are-gw#:~:text=The%20strongest%20gravitational%20waves%20are,)%2C%20and%20colliding%20neutron%20stars.
It’s also quite challenging to create high energy particles, they tend to rapidly collide and dissipate their energy. The CERN “Large Hadron Collider” is the most powerful particle accelerator that humans have built: https://home.cern/resources/faqs/facts-and-figures-about-lhc It involves 27 kilometers of superconducting magnets and produces proton collisions of 1.3*10^13eV.
Most cosmic rays are in the range of 10^6 eV to 10^9 eV https://news.uchicago.edu/explainer/what-are-cosmic-rays But there have been a few very powerful cosmic rays detected. Betwen 2004 and 2007, the Pierre Auger Observatory detected 27 events with energies above 5.7 * 10^19 eV and the “Oh-My-God” particle detected in 1991 had an energy of 3.2 * 10^20 eV.
So they can happen but would be extremely difficult for an adversary to generate. The only reason he put 10^11 as a limit is that’s the highest we’ve been able to definitively explore with accelerators. There may be more unexpected particles up there, but I don’t think they would make much of a difference to the kinds of devices we’re talking about.
But we certainly have to be vigilant! ASIs will likely explore every avenue and may very well be able to discover the “Theory of Everything”. We need to design our systems so that we can update them with new knowledge. Ideally we would also have confidence that our infrastructure could detect attempts to subvert it by pushing outside the domain of validity of our models.

Steve_Omohundro Aug 20, 2024, 8:33 PM
11 points
2
on: Limitations on Formal Verification for AI Safety
On one hand, while we should recognize that modeling techniques like discrete element analysis can produce quantitative estimates of real-world behavior – for example, how likely a drone is to crash, or how likely a bridge is to fail – we should not lose sight of the fact that such estimates are invariably just estimates and not guarantees. Additionally, from a practical standpoint, estimates of this sort for real-world systems most-often tend to be based on empirical studies around past results rather than prospective modeling. And to the extent that estimates are ever given prospectively about real-world systems, they are invariably considered to be estimates, not guarantees.
I think this is a critical point. Every engineering discipline has precise models of their domain. For example, I’m looking at Peter Childs’ “Mechanical Design Engineering Handbook” which surveys the field and is full of detailed partial differential equations for the behavior of different components like bearings, gears, clutches, seals, springs, fasteners, etc. It also has many tables showing the values (with significant figures) of parameters like viscosity of lubricants and the fatigue load of ball bearings.
How did they get these models and parameters? It’s a combination of inference from fundamental physics and experimental tests. Given more detailed physics understanding, fewer experiments are needed. And even with no physics, probabilistic models with PAC-like guarantees can be built purely from experiments if the test distribution is the same as the training distribution.
To build safe systems, engineers do computations on these models and bound the probability of system failure (hopefully to a low value!). These computations should be purely mathematical with precise answers verified by mathematical proofs. Unfortunately, that has not been the culture in “scientific computing” or “numerical methods” in the United States and often suggested the need for “numerical analysts” to look over your code and ensure that it was “numerically stable”!
In my book, that culture has been a disaster, leading to many failures and deaths. The Europeans developed a culture of “verified scientific computing” (for example, I was enamored of this book: “Numerical Toolbox for Verified Computing I: Basic Numerical Problems Theory, Algorithms, and Pascal-XSC Programs (Springer Series in Computational Mathematics)” which shows how to produce provably correct enclosures around solutions to equations, finding global optima, etc. These days these models can be formalized in a system like Lean and properties deduced using mathematical proof.
How much accuracy do we need and what should we do about incorrect models? For safety, the answer depends on the strengths of the adversary. If your adversary is a typical human, then often weak models suffice. Powerful AI adversaries, however, will look for weaknesses in the desing model and try to exploit them. To have any hope of providing actual security, these processes must be modeled and accounted for.
Most engineering disciplines actually have a kind of “stack” of models, each layer justified by the layer below. For example, in chip design there is a digital circuit layer on top of a physical layout layer on top of detailed electrodynamics and solid state materials, etc. Today’s chips do a pretty good job of isolating the layers, so a digital circuit laid out according to the chip process node’s design rules will actually implement the intended digital behavior.
But attacks like “Rowhammer” show what can happen when an adversary can affect a layer below the digital layer. By repeatedly accessing certain memory cells, an attacker can flip nearby bits and use this to extract cryptographic keys and violate security “guarantees”. As DRAMs have gotten denser, the problem has gotten worse. The human-designed mitigation solutions have led to a cat and mouse game that the attackers are winning. I believe that’s a great example of a critically important design problem which can only be solved by using AI and formal methods. The ideal solution would be to design chips which provably implement the intended digital model. But we have a lot of bad DRAM chips out there! I believe we need a formal electrodynamic model of the phenomenon and a provable representation of all access patterns which lead to violations. Then we need verified AI program synthesis to resynthesize programs to provably not generate any of the bad access patterns. I don’t think any other technique has a chance of dealing with that kind of issue. And you can be sure that advanced AI’s will be hammering away at weak computational infrastructure.

Steve_Omohundro Aug 20, 2024, 7:39 PM
6 points
0
in reply to: sanxiyn’s comment on: Limitations on Formal Verification for AI Safety
Thanks! His work looks very interesting! He recently did this nice talk which is very relevant: “Formal Verification in Scientific Computing”

Steve_Omohundro Aug 20, 2024, 7:34 PM
9 points
−5
in reply to: habryka’s comment on: Limitations on Formal Verification for AI Safety
Figure 1 in Carroll’s paper shows what is going on. At the base is the fundamental “Underlying reality” which we don’t yet understand (eg. it might be string theory or cellular automata, etc.):
Above that is the “Quantum Field Theory” level which includes the “Core Theory” which he explicitly shows in the paper and also possibly “Unknown particles and forces”. Above that is the “Macro Level” which includes both “Everyday life” which he is focusing on and also “Astrophysics and Cosmology”. His claim is that everything we experience in the “Everyday life” level depends on the “Underlying reality” level only through the “Core Theory” (ie. it is an “effective theory” kind of like fluid mechanics doesn’t depend on the details of particle interactions).
In particular, for energies less than 10^11 electron volts and for gravitational fields weaker than those around black holes, neutron stars, and the early universe, the results of every experiment is predicted by the Core theory to very high accuracy. If anything in this regime were not predicted to high accuracy, it would be front page news, the biggest development in physics in 50 years, etc. Part of this confidence arises from fundamental aspects of physics: locality of interaction, conservation of mass/energy, and symmetry under the Poincare group. These have been validated in every experiment ever conducted. Of course, as you say, physics isn’t finished and quantum theory in high gravitational curvature is still not understood.
Here’s a list of other unsolved problems in physics: https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_physics But the key point is that none of these impact AI safety (at least in the nearterm!). Certainly, powerful adversarial AI will look for flaws in our model of the universe as a potential opportunity for exploitation. Fortunately, we have a very strong current theory and we can use it to put bounds on the time and energy an AI would require to violate the conditions of validity (eg. create black holes, etc.) For long term safety and stability, humanity will certainly have to put restrictions on those capabilities, at least until the underlying physics is fully understood.

Steve_Omohundro Aug 20, 2024, 1:46 AM
51 points
3
on: Limitations on Formal Verification for AI Safety
Andrew, thanks for the post. Here’s a first response, I’ll eventually post more. I agree with most of your specific comments but come to a different set of conclusions (some of which Ben Goldhaber and I spelled out in this LessWrong post: “Provably Safe AI: Worldview and Projects” and some which I discuss in this video “VAISU Provably Safe AI”) I agree that formal methods are not yet widely used and that provides some evidence that there might be big challenges to doing so. But these are extraordinary times, both because we are likely to face much more powerful adversaries over the next decade and because our AI systems are likely to become much more mathematically sophisticated over the same time period. It is certainly valuable to understand why formal methods haven’t yet been adopted widely but it appears that both the driving necessity and the enabling technology may be rapidly changing soon.
I also agree that the transition period from our current technology to provably safe replacements is likely to be very challenging and may involve great hardship and loss of life. I believe we have a great opportunity right now to solve the problems you mention and others before they are critical. One overriding perspective in Max and my approach is that we need to design our systems so that their safety *can be formally verified*. In software, people often bring up the halting problem as an argument that general software can’t be verified. But we don’t need to verify general software, we are *designing* our systems so that they can be verified. That issue is also critically important for physical systems. And we are likely to have a challenging period where older systems that haven’t been designed for verified safety are vulnerable to attack.
You say:
In particular, physics, chemistry and biology are all complex sciences which do not have anything like complete symbolic rule sets.
We fortunately live in an era where we *do* have a complete formal understanding of the fundamental laws of physics. Sean Carrol summarizes this nicely in this paper: “The Quantum Field Theory on Which the Everyday World Supervenes” in which he argues that the Standard Model of particle physics plus Einstein’s general relativity completely describes all physical phenomena in ordinary human experience (ie. away from black holes, neutron stars, and the early universe and at energies less than 10^11eV, for comparison chemical bonds are less than 10eV and reactions in the sun are 10^8eV). Much of applied physics is about precise formal models of higher level phenomena (eg. fluids, elastic materials, plasmas, etc.) as beautifully summarized in Kip Thorne’s book: “Modern Classical Physics: Optics, Fluids, Plasmas, Elasticity, Relativity, and Statistical Physics”. Each of the engineering disciplines (eg. Mechanical Engineering, Electrical Engineering, Chemical Engineering, etc.) has their own formal models of their domain along with design rules for safe systems. They have extensive experiments and formal arguments which ground their models in fundamental physics. Fortunately, many of these disciplines are currently working to represent their fundamental models in formal proof assistants like Lean. For example, here is a chemical physics group who is doing that: “Formalizing chemical physics using the Lean theorem prover”
The Lean theorem prover is one of several powerful powerful proof assistants (others include Coq, Isabelle, MetaMath, HOL Light, and others) which are rich enough to encode all of mathematics including all the foundations of physics, engineering, computer science, economics, etc. The Lean library “Mathlib 4”: contains most of an undergraduate mathematics curriculum, some frontier mathematical areas, and is beginning to include probability theory, physics, and other disciplines. Many AI theorem provers are training on this library and are rapidly improving in generating Lean proofs. For example, “DeepSeek-Prover-V1.5″ is SOTA for open source provers: “DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search” and commercial AI companies like Harmonic appear to be making rapid progress: “One month in—A new SOTA on MiniF2F and more” Several experts working on this area say that we should expect LLM-based AI systems (including “agentic” add-ons like Monte Carlo Tree Search, etc.) to reach human PhD level at theorem proving by 2026. I believe that will open the floodgates of safety and other other applications.

Steve_Omohundro Aug 13, 2024, 5:31 PM
2 points
0
in reply to: Steve_Omohundro’s comment on: Provably Safe AI: Worldview and Projects
For example, this came out yesterday showing the sorry state of today’s voting machines: “The nation’s best hackers found vulnerabilities in voting machines — but no time to fix them” https://www.politico.com/news/2024/08/12/hackers-vulnerabilities-voting-machines-elections-00173668 And they’ve been working on making voting machines secure for decades!

Steve_Omohundro Aug 12, 2024, 10:52 PM
2 points
1
in reply to: Wei Dai’s comment on: Provably Safe AI: Worldview and Projects
Yes, I think there are many, many more possibilities as these systems get more advanced. A number of us are working to flesh out possible stable endpoints. One class of world states are comprised of humans, AIs, and provable infrastructure (including manufacturing and datacenters). In that world, humans have total freedom and abundance as long as they don’t harm other humans or the infrastructure. In those worlds, it appears possible to put absolute limits on the total world compute, on AI agency, on Moloch-like economic competition, on harmful evolution, on uncontrolled self-improvement, on uncontrolled nanotech, and on many other of today’s risks.
But everything depends on how we get from the present state to that future state. I think the Aschenbrenner timelines are plausible for purely technological development. But the social reality may change them drastically. For example, if the social upheaval leads to extensive warfare, I would imagine that GPU datacenters and chip fabs would be likely targets. That could limit the available compute sufficiently to slow down AI development dramatically.
I’ve spoken about some of the future possibilities in different talks like this one:
but I think it’s easier for most people to think about the more immediate issues before considering the longer term consequences.
Given this lower level of ambition, I fail to see how this approach or agenda can be positioned as an alternative to pausing AI.
Oh, any low level of ambition is just for this initial stage, without which the more advanced solutions can’t be created. For example, without a provably unbreakable hardware layer supporting provably correct software, you can’t have social contracts which everyone can absolutely trust to be carried out as specified.
And I think pausing would be great if it were possible! I signed the Pause letter and I think it generated good and important discussion, but I don’t think it slowed down AI development. And the instability of pausing means that at best we get a short extension of the AI development timeline. For that to be of long term value, humanity needs to use the extra time to do something! What should it do? It should flesh out all the pieces needed to effectively implement provable safety!

Steve_Omohundro Aug 12, 2024, 4:40 AM
3 points
−3
in reply to: ryan_greenblatt’s comment on: Provably Safe AI: Worldview and Projects
Intervals are often a great simple form of “enclosure” in continuous domains. For simple functions there is also “interval arithmetic” which cheaply produces a bounding interval on the output of a function given intervals on its inputs: https://en.wikipedia.org/wiki/Interval_arithmetic But, as you say, for complex functions it can blow up. For a simple example of why, consider the function “f(x)=x-x” evaluated on the input interval [0,1]. In the simplest interval arithmetic, the interval for subtraction has to bound the worst possible members of the intervals of its two inputs. In this case that would be a lower bound of “0-1″ and an upper bound of “1-0” producing the resulting interval: [-1,1]. But, of course, “x-x” is always 0, so this is huge over-approximation. People have developed all kinds of techniques for capturing the correlations between variables in evaluating circuits on intervals. And you can always shrink the error by splitting the input intervals and doing “branch and bound”. But all of those are just particular implementation choices in proving bounds on the output of the function. Advanced AI theorem provers (like AlphaProof) can use very sophisticated techniques to accurately get the true bound on the output of a function.
But, it may be that it’s not a fruitful approach to try to bound the behavior of complex neural nets such as transformers. In our approach, we don’t need to understand or constrain a complex AI generating a solution or a control policy. Rather, we require the AI to generate a program, control policy, or simple network for taking actions in the situation of interest. And we force it to generate a proof that it satisfies given safety requirements. If it can’t do that, then it has no business taking actions in a dangerous setting.

Steve_Omohundro Aug 12, 2024, 2:50 AM
3 points
0
in reply to: Bogdan Ionut Cirstea’s comment on: Provably Safe AI: Worldview and Projects
I agree that the human responses to all of this are the great unknown. The good news is that human programmers don’t need to learn to do formal verification, they just need to realize that it’s important and to get the AIs to do it. My suspicion is that these big shifts won’t happen until we start having AI-caused disasters. It’s interesting to look at the human response to the CrowdStrike disaster and other recent security issues to get some sense of how the responses might go. It’s also interesting to see the unbelievable speed with which Microsoft, Meta, Google, etc. have been building huge new GPU datacenters in response to the projected advances in AI.

Steve_Omohundro Aug 12, 2024, 2:38 AM
3 points
0
in reply to: Zac Hatfield-Dodds’s comment on: Provably Safe AI: Worldview and Projects
I am a big fan of verification in principle, but don’t think it’s feasible in such broad contexts in practice. Most of these proposals are far too vague for me to have an opinion on, but I’d be happy to make bets that
- (3) won’t happen (with nontrivial bounds) due to the difficulty of modeling nondeterministic GPUs, varying floating-point formats, etc.
- (8) won’t be attempted, or will fail at some combination of design, manufacture, or just-being-pickable.
- with some mutually agreed operationalization against the success of other listed ideas with a physical component
I’m afraid I’m not into betting and I’m not saying these projects will happen, but rather that they should happen. And I’m not saying they are easy but rather that they are important and I don’t see any huge barriers to doing them.
In general, non-deterministic hardware isn’t a problem for formalization. We’re not trying to simulate these systems, we’re trying to put bounds on their possible behaviors. In general, proofs work with enclosures that are guaranteed to contain the actual behavior. As far as I know, the various floating point formats are all precisely defined, so there’s no problem there. I know that some processor specs stated that their implementation of functions like sin and cos were only guaranteed up the last bit or something. That’s an annoyance but not a real problem for formalization. A simple approach is to maintain intervals which are guaranteed to contain the actual values and to prove that output intervals don’t overlap the unsafe region.
Probabilistic programs do provide greater challenges. It’s not a problem to formally model and prove properties of probability distributions. But it’s often no longer possible to provide 100% guarantees of desired safety properties. The approach of precise probabilistic theories like “Probably approximately correct learning” (https://en.wikipedia.org/wiki/Probably_approximately_correct_learning) is to provide a provable bound on the probability of failure.
There is subtlety in adversarial situations where there is a difference between uncertain actions which the adversary can’t affect for which an ordinary probabilistic analysis suffices and those where the adversary can influence apparently random variables. For example, I believe we need to create formal models of all of the engineering disciplines (eg. mechanical engineering, electrical engineering, chemical engineering, etc.). The rules that those engineers use are already quite formal and there are heuristic arguments as to why they are grounded in fundamental physics. But in an adversarial context, an adversary may be able to influence a deeper level of physics in a way that can break the assumptions of the model. For example, mechanical engineers have models of the elastic properties of steel beams but if an adversary can alter the manufacture of those beams in a way that makes them vulnerable to certain kinds of failure, then the mechanical engineer’s assumptions can be invalidated. So the formal models also need to model potential vulnerabilities in the assumptions of the higher level models.
I believe that these developments are not optional. It’s just a question of whether humanity prepares now for what’s coming or waits until after the basic assumptions of our infrastructure and society are upended.

Steve_Omohundro Aug 12, 2024, 2:15 AM
5 points
−2
in reply to: Wei Dai’s comment on: Provably Safe AI: Worldview and Projects
Do you envision being able to formalize social systems and desirable properties for them, based on current philosophical understanding of topics like human values/goals and agency / decision theory? I don’t, and also think philosophical progress on these topics is not happening fast enough to plausibly solve the problems in time.
I think there are both short-term and long-term issues. In the short-term society is going to have to govern a much wider range of behaviors than we have today. In the next few years we are likely to have powerful AIs taking actions online and in the physical world both autonomously or at the behest of humans. I think that the most immediate issue is that today’s infrastructure is likely to be very vulnerable to AI-powered attacks. The basic mechanisms of society are likely to be very vulnerable: voting, communication, banking, etc. Already we are seeing AI-driven hacking, impersonation, fake news, spam, etc. Next year’s more powerful AI agents will likely take these to the next level.
Social media sites are already getting overwhelmed by spam, fake images, fake videos, blackmail attempts, phishing, etc. The only way to counteract the speed and volume of massive AI-driven attacks is with AI-powered defenses. These defenses need rules. If those rules aren’t formal and proven robust, then they will likely be hacked and exploited by adversarial AIs. So at the most basic level, we need infrastructure rules which are provably robust against classes of attacks. What those attack classes are and what properties those rules guarantee is part of what I’m arguing we need to be working on right now. It looks like the upheaval is coming whether we want it or not. And the better prepared we are for it, the better the outcome.
Over the longer term (which may not be that long if major societal transformation comes around 2030), I totally agree that there are many important philosophical issues that we will need to address. And I agree that we shouldn’t try to solve all those problems right now. But we should create a path along which we can make the choices and which is implementable with the technology which appears imminent. Today, democratic societies use processes like voting to choose governance rules. The blockchain world has explored various precise variants of those procedures. I think a good path forward might involve precisely formalizing effective mechanisms like prediction markets, quadratic voting, etc. so that we have confidence that future social infrastructure actually implements it.
It appears that rather simple provable hardware (what Max and I called “provable contracts”) can enable the solution of many social dilemmas to produce better outcomes for all. Provable hardware can impose joint constraints on the choices of multiple parties and can include externalities in the execution of contracts. If implemented correctly, these could dramatically improve the efficiency, fairness, and effectiveness of human societies.

Steve_Omohundro Aug 11, 2024, 5:29 AM
5 points
0
in reply to: Bogdan Ionut Cirstea’s comment on: Provably Safe AI: Worldview and Projects
Timelines are certainly short. But if AI systems progress as rapidly as Aschenbrenner and others are predicting, then most of the work will be done by AIs. Of course, we have to know what we want them to accomplish! And that’s something we can work on before the AIs are dangerous or powerful enough to build the safe infrastructure.
Max and I argued that systems which are not built with security proofs will likely have vulnerabilities and that powerful AI’s can search form them using formal models and are likely to find them. So if things unfold in an adversarial way, any software, hardware, or social systems not built with provable safety will likely be exploited.

Steve_Omohundro Aug 11, 2024, 5:22 AM
2 points
0
in reply to: David James’s comment on: Provably Safe AI: Worldview and Projects
Absolutely! In different circumstances different classes of adversarial capabilities are appropriate. Any formal security claim will always be relative to a specified class of attackers. For the most secure infrastructure which must be secure against any physical agent (including the most powerful AGIs), you would show security properties against any agent which obeys the laws of physics. Max and I talk about “provable contracts” which sense any form of physical tampering and erase internal cryptographic keys to guarantee that contained information cannot be extracted.
Today, many residential front doors are vulnerable to “bump key” attacks which involve inserting a key with all the key bittings filed down and tapping it with a mallet to rapidly open the door with no skill. There would be tremendous benefit to mechanical locks which are not vulnerable to this or more sophisticated purely mechanical attacks.
It would also be great to create designs which are provably invulnerable to the kind of sophisticated key you suggest. A general approach is to physically isolate the part of the lock which senses the pattern on the key from the part which checks that that pattern is correct. In today’s locks, those two functions happen together and lockpickers exploit that. In the kinds of designs I’d prefer to see, no information about the correct key is accessible from the part of the lock accessible to the attacker.

Steve_Omohundro Aug 11, 2024, 5:09 AM
5 points
0
in reply to: Steve_Omohundro’s comment on: Provably Safe AI: Worldview and Projects
“‘Sinkclose’ Flaw in Hundreds of Millions of AMD Chips Allows Deep, Virtually Unfixable Infections” https://archive.ph/MsA0m

Steve_Omohundro Aug 11, 2024, 5:08 AM
5 points
0
in reply to: mishka’s comment on: Provably Safe AI: Worldview and Projects
Thanks for the links! Yes, the cost of errors is high for chips, so chip companies have been very supportive of formal methods. But they’ve primarily focused on the correctness of digital circuits. There are many other physical attack surfaces.
For example, most DRAM chips are vulnerable to the “Rowhammer” attack https://en.wikipedia.org/wiki/Row_hammer and as memories get denser, they become more vulnerable. This involves memory access patterns which flip bits in neighboring rows which can be exploited to discover cryptographic keys and break security assumptions in processors. The solutions so far are fairly hacky and attackers have found work arounds. A full solution likely will involve a formal model including the electromagnetic and material physics of the memory chip. This might be used to provably characterize all problematic access patterns and to provably translate software into equivalent versions which provably don’t produce the dangerous accesses.
Another example from yesterday is the AMD “Sinkclose” flaw which affects all AMD chips back to 2006 and makes them vulnerable to attacks after which “You basically have to throw your computer away.” In a world of powerful AI-enabled attackers, that kind of flaw is unlikely to be acceptable.
With the recent developments in drones, humanoid robots, and autonomous vehicles, we are likely to need provable constraints on the behavior of a wide variety of physical systems. In the paper with Max we argue that this is possible and that the laws of physics provide an absolute constraint on the behavior of AIs. But it’s not easy! And the time is likely short.

Provably Safe AI: Worldview and Projects

Ben Goldhaber and Steve_Omohundro

Aug 9, 2024, 11:21 PM

54 points

44 comments7 min readLW link

Steve_Omohundro Oct 10, 2023, 4:49 AM
4 points
0
in reply to: PeterMcCluskey’s comment on: Provably Safe AI
Fortunately, for coarse “guardrails” the specs are pretty simple and can often be reused in many contexts. For example, all software we want to run should have proofs that: 1) there aren’t memory leaks, 2) there aren’t out-of-bounds memory accesses, 3) there aren’t race conditions, 4) there aren’t type violations, 5) there aren’t buffer overflows, 6) private information is not exposed by the program, 7) there aren’t infinite loops, etc. There should be a widely used “metaspec” for those criteria which most program synthesis AI will have to prove their generated code satisfies. Similarly, there are common constraints for many physical systems: eg. robots, cars, planes, boats, etc. shouldn’t crash into things or harm humans, etc. The more refined the rules are, the more subtle they become. To prevent existentially bad outcomes, I believe coarse constraints suffice. But certainly we eventually want much more refined models of the world and of the outcomes we seek. I’m a fan of “Digital Twins” of physical systems which allow rules and constraints to be run in simulation which can help in choosing specifications. We certainly want those simulations to be trusted. which can be achieved by proving the code actually simulates the systems it claims to. Eventually it would be great to have fully trusted AI as well! Mechanistic Interpretability should be great for that! I’m just reading Anthropic’s recent nice advances in that. If that continues to make progress then it makes our lives much easier but it doesn’t eliminate the need to ensure that misaligned AGI and malicious AGI don’t cause harm. The big win with the proof checking and the cryptographic hardware we propose is that we can ensure that even powerful systems will obey rules that humanity selects. If we don’t implement that kind of system (or something functionally equivalent), then there will be dangerous pathways which malicious AGI can exploit to cause great harm to humans.

Steve_Omohundro

Prov­ably Safe AI: Wor­ld­view and Projects

Provably Safe AI: Worldview and Projects