[Question] Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe?

Maxime Riché9 Aug 2020 17:17 UTC

2 points

If we have an ASI, which goal is to efficiently model and run alternate trajectories of our universe (including the future) (scholastically and a high number of times), then the value of our universe would equals its expected value, right?

Can we reasonably expect to achieve more than that from AI alignment?
If not, should we simply aim for this? (Programming an ASI to model and run alternate universes may take advantage of the fact than the law of physics are constant)
What are the problems in this reasoning?

(Edited after comment by steve2152 and comment by naimenz)

Some details added for clarification :

To be more precise, what I have in mind is that the ASI is an agent which goal is:

to model the sentient part of the universe finely enough to produce sentience in an instance of its model (and it will also need to model the necessary non-sentient “dependencies”)
and to instantiate this model N times. For example, playing them from 1000 A.D. to the time where no sentience remains in a given instance of modeled universe. (all of this efficiently)

An instance of the modeled universe would not be our present universe. It would be “another seed”, starting before that the ASI exists and thus it would not need to model itself but only possible (“new”) ASI produced inside the instances.

In the scenario I had in mind, the ASI would fill our universe will computing machines to produce as many instances as possible. (We would not use it and thus we will not need interface with the ASI)

Possible problems:

This ASI may produce a lot of alternate universe instances where an ASI (produced inside the instance) will also start to run instances of modeled universe. This would probably be pretty bad since in our universe, the history before ASI is net-negative.
This doesn’t solve the problem of aligning the ASI with the goal described above. This only replaces “aligning AGI with humans values” by “aligning AGI to run instances of our universe”. Yet, this seems to ease the problem by having a simpler objective: “predict the next step of the sentient part of the universe in a loop”. (Finally, I don’t know how, but we may use the fact that physics laws are constant and unchangeable, to my knowledge.)

Maxime Riché9 Aug 2020 17:17 UTC

2 points

5 comments1 min readLW link

Steven Byrnes 9 Aug 2020 12:38 UTC
3 points
I think I would make some modifications to your proposal to make it more realistic.
First, I don’t know if you intended this, but “stimulating the universe” carries a connotation of a low-level physics simulation. This is computationally impossible. Let’s have it model the universe instead, using the same kind of high-level pattern recognition that people use to predict the future.
Second, if the AGI is simulating itself, the predictions are wildly undetermined; it can predict that it will do X, and then fulfill its own prophecy by actually doing X, for any X. Let’s have it model a counterfactual world with no AGIs in it.
Third, you need some kind of interface. Maybe you type in “I’m interested in future scenarios in which somebody cures Alzheimer’s and writes a scientific article describing what they did. What is the text of that article?” and then it runs through a bunch of possible futures and prints out its best-guess article in the first 50 futures it finds in which the prompted premise comes true. (Maybe also print out a retrospective article from 20 years later about the long-term repercussions of the invention.) For a different type of interface, see microscope AI.
If this is no longer related to your question, I apologize! But if you’re still with me, we can ask two questions about this proposal:
First, do we know how to do this kind of thing safely? I think that’s an open problem. See, for example, my self-supervised learning and manipulative predictions for one thing that might (or might not!) go wrong in seemingly-harmless versions of this type of system. Since writing that, I’ve been feeling even more pessimistic because, to make the system really work well, I think we might have to put in various kinds of self-awareness and curiosity and other motivations that make for much more obvious types of risks.
Second, if we do make such a system, does it do the things we want AGI to do? Well, I think it’s a bit complicated. If we use the system directly, it always needs a human in the loop, whereas there a lot of things that people might want AGIs to do directly, like drive cars or do other boring and/or safety-critical jobs. On the other hand, we could bootstrap by having the prediction system help us design a safe and aligned agential AGI. I also wrote about this last year at In defense of Oracle (“Tool”) AI research and see also the comments.
- Maxime Riché 9 Aug 2020 13:09 UTC
  1 point
  Parent
  (About the first part of your comment) Thank you for pointing to three confused points:
  First, I don’t know if you intended this, but “stimulating the universe” carries a connotation of a low-level physics simulation. This is computationally impossible. Let’s have it model the universe instead, using the same kind of high-level pattern recognition that people use to predict the future.
  To be more precise, what I had in mind is that the ASI is an agent which goal is:
  - to model the sentient part of the universe finely enough to produce sentience in an instance of its model (and it will also need to model the necessary non-sentient “dependencies”)
  - and to instantiate this model N times. For example, playing them from 1000 A.D. to the time where no sentience remains in a given instance of modeled universe. (all of this efficiently)
  (To reduce complexity, I didn’t mention it but we could think of heuristics to reduce playing to much of the “past” and “future” history filled suffering)
  Second, if the AGI is simulating itself, the predictions are wildly undetermined; it can predict that it will do X, and then fulfill its own prophecy by actually doing X, for any X. Let’s have it model a counterfactual world with no AGIs in it.
  An instance of the modeled universe would not be our present universe. It would be “another seed”, starting before that the ASI exists and thus it would not need to model itself but only possible (“new”) ASI produced inside the instances.
  Third, you need some kind of interface. Maybe you type in “I’m interested in future scenarios in which somebody cures Alzheimer’s and writes a scientific article describing what they did. What is the text of that article?” and then it runs through a bunch of scenarios and prints out its best-guess article in the first 50 scenarios it can find. (Maybe also print out a retrospective article from 20 years later about the long-term repercussions of the invention.) For a different type of interface, see microscope AI.
  In the scenario I had in mind, the ASI would fill our universe will computing machines to produce as many instances as possible. (We would not use it and thus we will not need interface with the ASI)
Ian McKenzie 9 Aug 2020 13:52 UTC
1 point
Is your suggestion to run this system as a source of value, simulating lives for their own sake rather than to improve the quality of life of sentient beings in our universe? Our history (and present) aren’t exactly utopian, and I don’t see any real reason to believe that slight variations on it would lead to anything happier.

I think we can expect to achieve a lot more than that from a properly aligned AGI. There is so much suffering that could be alleviated right now with proper coordination, as a lower bound on how much better it could be than just effectively running copies of our timeline but at lower resolution.
- Maxime Riché 9 Aug 2020 14:02 UTC
  1 point
  Parent
  Is your suggestion to run this system as a source of value, simulating lives for their own sake rather than to improve the quality of life of sentient beings in our universe? Our history (and present) aren’t exactly utopian, and I don’t see any real reason to believe that slight variations on it would lead to anything happier.
  I am thinking about if we should reasonably expect to produce better result by trying to align an AGI with our value than by simulating a lot of alternate universes. I am not saying that this is net-negative or net-positive. It seems to me that the expected value of both cases may be identical.
  
  Also by history, I also meant the future, not only the past and present. (I edited the question to replace “histories” by “trajectories”)

Charlie Steiner 9 Aug 2020 17:07 UTC
3 points
This doesn’t solve the problem of aligning the ASI with the goal described above. This only replaces “aligning AGI with humans values” by “aligning AGI to run instances of our universe”. Yet, this seems to ease the problem by having a simpler objective: “predict the next step of the sentient part of the universe in a loop”. (Finally, I don’t know how, but we may use the fact that physics laws are constant and unchangeable, to my knowledge.)
Yeah, this would be half of my central concern. It just doesn’t seem particularly easier to specify ideas like “run a simulation, but only worry about getting the sentient parts right” than it does to specify “fulfill human values.” And then once we’ve got that far, I do think it’s significantly more valuable to go for human-aligned AI than to roll the dice again.