jacob_cannell comments on LOVE in a simbox is all you need

jacob_cannell 29 Sep 2022 18:41 UTC
4 points
Interesting. An intelligent agent is one that can simulate/model its action-consequential futures. The creation of AGI is the most important upcoming decision we face. Thus if humanity doesn’t simulate/model the creation of AGI before creating AGI, we’d be unintelligent.

Have only just browsed your link, but it is interesting and I think there are many convergent lines of thought here. This UAT work seems more focused on later game superintelligence, whereas here i’m focusing on near-term AGI and starting with a good trajectory. The success of UAT as an alignment aid seems to depend strongly on the long term future of compute and how it scales. For example if it turns out (and the SI can predict) that moore’s law ends without exotic computing then the SI can determine it’s probably not in a sim by the time it’s verifiably controlling planet-scale compute (or earlier).
- plex 30 Sep 2022 12:25 UTC
  4 points
  Parent
  Thanks for the interest! I agree that attempting to run AIs in simulations to see how they act seems like a worthwhile step, and we actually converged to wanting to test this in a LLM as well.
  To reply to your last point, even if the AI has very high (but not 100%) confidence that it’s not in a sim, this scheme should still work. The reason is outlined in this section from the document:
  Why would it believe there is a higher level? What if it became very confident that there was no higher level universe?
  The AI does not need to believe with any more than nonzero confidence that there is a one-level-up. Its utility function is totally indifferent to all worlds where there is no higher level universe, since we make it only care about getting control of one level up. This means that even if it becomes extremely certain (approaching 100%) that there is no higher level, it will effectively condition on there being a higher simulation.
  This might lead to problems later when we have become an intergalactic civilisation and it is conditioning on an increasingly tiny sliver of possible worlds capable of simulating something so vast. One thing that might reduce this concern is that in our AI’s early stages, when there is a relatively large pool of simulators, it would have good reason to make precommitments which fix in values that the seed thinks will look good to the aligning simulators, rather than acting from the seed values directly for all of time. (One good reason to do this is that the civilisations one level up also care about this objection!) If our AI builds a successor AI that is aligned to humane values, then shuts down gracefully, many of the potential civilisations one level up should be willing to implement the seed in their universe, expecting it to build a successor aligned to their values and then shut down gracefully. This is analogous to several Decision Theory problems, such as Parfit’s Hitchhiker, and FDT-style reasoning leads to reasonable outcomes.