jacob_cannell comments on Would we even want AI to solve all our problems?

jacob_cannell Apr 21, 2023, 6:23 PM
16 points
−1
Value is fragile. The goal is not to alleviate every ounce of discomfort; the goal is to make the future awesome. My guess is that that involves leaving people with real decisions that have real consequences, that it involves giving people the opportunity to screw up, that it involves allowing the universe to continue to be a place of obstacles that people must overcome.
This is AGI optimizing for human empowerment.
- the gears to ascension Apr 21, 2023, 9:50 PM
  5 points
  5
  Parent
  that’s probably part of it, agreed. it’s probably not even close to closing the key holes in the loss landscape, though. you have a track record of calling important shots and people would do well to take you seriously, but at the same time, they’d do well not to assume you have all the answers. upvote and agree. how does that solve program equilibria in osgt though? how does it bound worst mistake, how does it bound worst powerseeking? how does it ensure defensibility?
  - jacob_cannell Apr 23, 2023, 12:36 AM
    4 points
    0
    Parent
    I’m not sure what you mean by solve program equilibria in osgt—partly because i’m not sure what ‘osgt’ means.
    
    Optimizing for human/external empowerment doesn’t bound the worst mistakes the agent can make. If by powerseeking you mean the AI seeking its own empowerment, the AI may need to do that in the near term, but in the long term that is an opposing rather obviously unaligned utility function. Navigating that transition tradeoff is where much of the difficulty seems to lie—but I expect that to be true of any viable solution. Not sure what you mean by defensibility.
    - the gears to ascension Apr 23, 2023, 3:02 AM
      6 points
      0
      Parent
      program equilibria in open-source game theory: once a model is strong enough to make exact mathematical inferences about the implications of the way the approximator’s actual learned behavior landed after training, the reflection about game theory can be incredibly weird. this is where much of the stuff about decision theories comes up, and the reason we haven’t run into it already is because current models are big enough to be really hard to prove through. Related work, new and old:
      https://arxiv.org/pdf/2208.07006.pdf—Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory—near the top of my to read list; by Andrew Critch who has some other posts on the topic, especially the good ol “Open Source Game Theory is weird”, and several more recent ones I haven’t read properly at all
      https://arxiv.org/pdf/2211.05057.pdf—A Note on the Compatibility of Different Robust Program Equilibria of the Prisoner’s Dilemma
      https://arxiv.org/pdf/1401.5577.pdf—Robust Cooperation in the Prisoner’s Dilemma: Program Equilibrium via Provability Logic
      https://www.semanticscholar.org/paper/Program-equilibrium-Tennenholtz/e1a060cda74e0e3493d0d81901a5a796158c8410?sort=pub-date—the paper that introduced OSGT, with papers citing it sorted by recency
      also interesting https://www.semanticscholar.org/paper/Open-Problems-in-Cooperative-AI-Dafoe-Hughes/2a1573cfa29a426c695e2caf6de0167a12b788ef and https://www.semanticscholar.org/paper/Foundations-of-Cooperative-AI-Conitzer-Oesterheld/5ccda8ca1f04594f3dadd621fbf364c8ec1b8474
      This also connects through to putting neural networks in formal verification systems. The summary right now is that it’s possible but doesn’t scale to current model sizes. I expect scalability to surprise us.
    - the gears to ascension Apr 23, 2023, 3:18 AM
      5 points
      0
      Parent
      Bounding worst mistake: preventing adversarial examples and generalization failures. Plenty of work on this in general, but in particular I’m interested in certified bounds. (Though those usually turn out to have some sort of unhelpfully tight premise.)
      tons of papers I could link here that I haven’t evaluated deeply, but you can find a lot of them by following citations from https://www.katz-lab.com/research—in particular:
      Verifying Generalization in Deep Learning
      gRoMA: a Tool for Measuring Deep Neural Networks Global Robustness
      here’s what’s on my to-evaluate list in my ai formal verification and hard robustness tag in semanticscholar: https://arxiv.org/pdf/2302.04025.pdf https://arxiv.org/pdf/2304.03671.pdf https://arxiv.org/pdf/2303.10513.pdf https://arxiv.org/pdf/2303.03339.pdf https://arxiv.org/pdf/2303.01076.pdf https://arxiv.org/pdf/2303.14564.pdf https://arxiv.org/pdf/2303.07917.pdf https://arxiv.org/pdf/2304.01218.pdf https://arxiv.org/pdf/2304.01826.pdf https://arxiv.org/pdf/2304.00813.pdf https://arxiv.org/pdf/2304.01874.pdf https://arxiv.org/pdf/2304.03496.pdf https://arxiv.org/pdf/2303.02251.pdf https://arxiv.org/pdf/2303.14961.pdf https://arxiv.org/pdf/2301.11374.pdf https://arxiv.org/pdf/2303.10024.pdf—most of these are probably not that amazing, but some of them seem quite interesting. would love to hear which stand out to anyone passing by!
    - the gears to ascension Apr 23, 2023, 9:26 AM
      4 points
      0
      Parent
      (and I didn’t even mention reliable ontological grounding in the face of arbitrarily large ontological shifts due to fundamental representation corrections)
    - the gears to ascension Apr 23, 2023, 3:01 AM
      4 points
      0
      Parent
      I’m replying as much to anyone else who’d ask the same question as I’m answering you in particular; I imagine you’ve seen some of this stuff in passing before. Hope the detail helps anyway! I’m replying in multiple comments to organize responses better. I’ve unvoted all my own comments but this one so it shows up as the top reply to start with.
    - the gears to ascension Apr 23, 2023, 3:02 AM
      3 points
      0
      Parent
      inappropriate powerseeking: seeking to achieve empowerment of the ai over empowerment of others, in order to achieve adversarial peaks of the reward model, or etc; ie, “you asked for collaborative powerseeking and instead got deceptive alignment due to an adversarial hole in your model”. Some recent papers that try to formalize this in terms of RL:
      https://arxiv.org/pdf/2304.06528.pdf—Power-seeking can be probable and predictive for trained agents—see also the lesswrong post
      https://arxiv.org/pdf/2206.13477.pdf—Parametrically Retargetable Decision-Makers Tend To Seek Power—see also the lesswrong post
      Boundaries sequence is also relevant to this
      (also, having adversarial holes in behavior makes the OSGT branch of concern look like “smart model reads the weights of your vulnerable model and pwns it” rather than any sort of agentically intentional cooperation.)