Steven Byrnes answers Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe?

Steven Byrnes 9 Aug 2020 12:38 UTC
3 points
I think I would make some modifications to your proposal to make it more realistic.
First, I don’t know if you intended this, but “stimulating the universe” carries a connotation of a low-level physics simulation. This is computationally impossible. Let’s have it model the universe instead, using the same kind of high-level pattern recognition that people use to predict the future.
Second, if the AGI is simulating itself, the predictions are wildly undetermined; it can predict that it will do X, and then fulfill its own prophecy by actually doing X, for any X. Let’s have it model a counterfactual world with no AGIs in it.
Third, you need some kind of interface. Maybe you type in “I’m interested in future scenarios in which somebody cures Alzheimer’s and writes a scientific article describing what they did. What is the text of that article?” and then it runs through a bunch of possible futures and prints out its best-guess article in the first 50 futures it finds in which the prompted premise comes true. (Maybe also print out a retrospective article from 20 years later about the long-term repercussions of the invention.) For a different type of interface, see microscope AI.
If this is no longer related to your question, I apologize! But if you’re still with me, we can ask two questions about this proposal:
First, do we know how to do this kind of thing safely? I think that’s an open problem. See, for example, my self-supervised learning and manipulative predictions for one thing that might (or might not!) go wrong in seemingly-harmless versions of this type of system. Since writing that, I’ve been feeling even more pessimistic because, to make the system really work well, I think we might have to put in various kinds of self-awareness and curiosity and other motivations that make for much more obvious types of risks.
Second, if we do make such a system, does it do the things we want AGI to do? Well, I think it’s a bit complicated. If we use the system directly, it always needs a human in the loop, whereas there a lot of things that people might want AGIs to do directly, like drive cars or do other boring and/or safety-critical jobs. On the other hand, we could bootstrap by having the prediction system help us design a safe and aligned agential AGI. I also wrote about this last year at In defense of Oracle (“Tool”) AI research and see also the comments.
- Maxime Riché 9 Aug 2020 13:09 UTC
  1 point
  Parent
  (About the first part of your comment) Thank you for pointing to three confused points:
  First, I don’t know if you intended this, but “stimulating the universe” carries a connotation of a low-level physics simulation. This is computationally impossible. Let’s have it model the universe instead, using the same kind of high-level pattern recognition that people use to predict the future.
  To be more precise, what I had in mind is that the ASI is an agent which goal is:
  - to model the sentient part of the universe finely enough to produce sentience in an instance of its model (and it will also need to model the necessary non-sentient “dependencies”)
  - and to instantiate this model N times. For example, playing them from 1000 A.D. to the time where no sentience remains in a given instance of modeled universe. (all of this efficiently)
  (To reduce complexity, I didn’t mention it but we could think of heuristics to reduce playing to much of the “past” and “future” history filled suffering)
  Second, if the AGI is simulating itself, the predictions are wildly undetermined; it can predict that it will do X, and then fulfill its own prophecy by actually doing X, for any X. Let’s have it model a counterfactual world with no AGIs in it.
  An instance of the modeled universe would not be our present universe. It would be “another seed”, starting before that the ASI exists and thus it would not need to model itself but only possible (“new”) ASI produced inside the instances.
  Third, you need some kind of interface. Maybe you type in “I’m interested in future scenarios in which somebody cures Alzheimer’s and writes a scientific article describing what they did. What is the text of that article?” and then it runs through a bunch of scenarios and prints out its best-guess article in the first 50 scenarios it can find. (Maybe also print out a retrospective article from 20 years later about the long-term repercussions of the invention.) For a different type of interface, see microscope AI.
  In the scenario I had in mind, the ASI would fill our universe will computing machines to produce as many instances as possible. (We would not use it and thus we will not need interface with the ASI)