Steve_Omohundro comments on Provably Safe AI: Worldview and Projects

Steve_Omohundro 12 Aug 2024 2:15 UTC
5 points
−2
Do you envision being able to formalize social systems and desirable properties for them, based on current philosophical understanding of topics like human values/goals and agency / decision theory? I don’t, and also think philosophical progress on these topics is not happening fast enough to plausibly solve the problems in time.
I think there are both short-term and long-term issues. In the short-term society is going to have to govern a much wider range of behaviors than we have today. In the next few years we are likely to have powerful AIs taking actions online and in the physical world both autonomously or at the behest of humans. I think that the most immediate issue is that today’s infrastructure is likely to be very vulnerable to AI-powered attacks. The basic mechanisms of society are likely to be very vulnerable: voting, communication, banking, etc. Already we are seeing AI-driven hacking, impersonation, fake news, spam, etc. Next year’s more powerful AI agents will likely take these to the next level.
Social media sites are already getting overwhelmed by spam, fake images, fake videos, blackmail attempts, phishing, etc. The only way to counteract the speed and volume of massive AI-driven attacks is with AI-powered defenses. These defenses need rules. If those rules aren’t formal and proven robust, then they will likely be hacked and exploited by adversarial AIs. So at the most basic level, we need infrastructure rules which are provably robust against classes of attacks. What those attack classes are and what properties those rules guarantee is part of what I’m arguing we need to be working on right now. It looks like the upheaval is coming whether we want it or not. And the better prepared we are for it, the better the outcome.
Over the longer term (which may not be that long if major societal transformation comes around 2030), I totally agree that there are many important philosophical issues that we will need to address. And I agree that we shouldn’t try to solve all those problems right now. But we should create a path along which we can make the choices and which is implementable with the technology which appears imminent. Today, democratic societies use processes like voting to choose governance rules. The blockchain world has explored various precise variants of those procedures. I think a good path forward might involve precisely formalizing effective mechanisms like prediction markets, quadratic voting, etc. so that we have confidence that future social infrastructure actually implements it.
It appears that rather simple provable hardware (what Max and I called “provable contracts”) can enable the solution of many social dilemmas to produce better outcomes for all. Provable hardware can impose joint constraints on the choices of multiple parties and can include externalities in the execution of contracts. If implemented correctly, these could dramatically improve the efficiency, fairness, and effectiveness of human societies.
- Wei Dai 12 Aug 2024 21:35 UTC
  8 points
  6
  Parent
  
  I think a good path forward might involve precisely formalizing effective mechanisms like prediction markets, quadratic voting, etc. so that we have confidence that future social infrastructure actually implements it.
  
  In the Background section, you talk about “superhuman AI in 2028 or 2029”, so I interpreted you as trying to design AIs that are provably safe even as they scale to superhuman intelligence, or designing social mechanisms that can provably ensure that overall society will be safe even when used by superhuman AIs.
  
  But here you only mention proving that prediction markets and quadratic voting are implemented correctly, which seems like a much lower level of ambition, which is good as far as feasibility, but does not address many safety concerns, such as AI-created bioweapons, or the specific concern I gave in my grandparent comment. Given this lower level of ambition, I fail to see how this approach or agenda can be positioned as an alternative to pausing AI.
  - Steve_Omohundro 12 Aug 2024 22:52 UTC
    2 points
    1
    Parent
    Yes, I think there are many, many more possibilities as these systems get more advanced. A number of us are working to flesh out possible stable endpoints. One class of world states are comprised of humans, AIs, and provable infrastructure (including manufacturing and datacenters). In that world, humans have total freedom and abundance as long as they don’t harm other humans or the infrastructure. In those worlds, it appears possible to put absolute limits on the total world compute, on AI agency, on Moloch-like economic competition, on harmful evolution, on uncontrolled self-improvement, on uncontrolled nanotech, and on many other of today’s risks.
    But everything depends on how we get from the present state to that future state. I think the Aschenbrenner timelines are plausible for purely technological development. But the social reality may change them drastically. For example, if the social upheaval leads to extensive warfare, I would imagine that GPU datacenters and chip fabs would be likely targets. That could limit the available compute sufficiently to slow down AI development dramatically.
    I’ve spoken about some of the future possibilities in different talks like this one:
    but I think it’s easier for most people to think about the more immediate issues before considering the longer term consequences.
    Given this lower level of ambition, I fail to see how this approach or agenda can be positioned as an alternative to pausing AI.
    Oh, any low level of ambition is just for this initial stage, without which the more advanced solutions can’t be created. For example, without a provably unbreakable hardware layer supporting provably correct software, you can’t have social contracts which everyone can absolutely trust to be carried out as specified.
    And I think pausing would be great if it were possible! I signed the Pause letter and I think it generated good and important discussion, but I don’t think it slowed down AI development. And the instability of pausing means that at best we get a short extension of the AI development timeline. For that to be of long term value, humanity needs to use the extra time to do something! What should it do? It should flesh out all the pieces needed to effectively implement provable safety!
    - Steve_Omohundro 13 Aug 2024 17:31 UTC
      2 points
      0
      Parent
      For example, this came out yesterday showing the sorry state of today’s voting machines: “The nation’s best hackers found vulnerabilities in voting machines — but no time to fix them” https://www.politico.com/news/2024/08/12/hackers-vulnerabilities-voting-machines-elections-00173668 And they’ve been working on making voting machines secure for decades!
- Wei Dai 15 Aug 2024 3:11 UTC
  6 points
  2
  Parent
  
  Social media sites are already getting overwhelmed by spam, fake images, fake videos, blackmail attempts, phishing, etc. The only way to counteract the speed and volume of massive AI-driven attacks is with AI-powered defenses. These defenses need rules. If those rules aren’t formal and proven robust, then they will likely be hacked and exploited by adversarial AIs. So at the most basic level, we need infrastructure rules which are provably robust against classes of attacks. What those attack classes are and what properties those rules guarantee is part of what I’m arguing we need to be working on right now.
  
  Maybe it would be more productive to focus on these nearer-term topics, which perhaps can be discussed more concretely. Have you talked to any experts in formal methods who think that it would be feasible (in the near future) to define such AI-driven attack classes and desirable properties for defenses against them, and do they have any specific ideas for doing so? Again from my own experience in cryptography, it took decades to formally define/refine seemingly much simpler concepts, so it’s hard for me to understand where your relative optimism comes from.