Possible first step: we should start carefully and durably recoding the current state of AI development, related plans and associated power dynamics. That way, victors can generate a more precise estimate of the distribution over the values of possible counterfactual victors. We also signal our own commitment to such a value sharing scheme.
Also, not sure we even need simulations at all. Many-worlds QM seems like it should work just as well for this sort of values handshake. In fact, many-worlds would probably work even better because:
it’s not dependent on how feasible it turns out to be to simulate realistic counterfactual timelines.
the distribution over possible outcomes is wider. If we turn out to be on a doomed timeline such that humanity has essentially zero chance of emerging the victor, there may be other timelines that are less doomed which split off from ours in the past.
there’s no risk of a “treacherous turn” if the AI decides it’s not actually being simulated.
Sounds like a good idea.
Possible first step: we should start carefully and durably recoding the current state of AI development, related plans and associated power dynamics. That way, victors can generate a more precise estimate of the distribution over the values of possible counterfactual victors. We also signal our own commitment to such a value sharing scheme.
Also, not sure we even need simulations at all. Many-worlds QM seems like it should work just as well for this sort of values handshake. In fact, many-worlds would probably work even better because:
it’s not dependent on how feasible it turns out to be to simulate realistic counterfactual timelines.
the distribution over possible outcomes is wider. If we turn out to be on a doomed timeline such that humanity has essentially zero chance of emerging the victor, there may be other timelines that are less doomed which split off from ours in the past.
there’s no risk of a “treacherous turn” if the AI decides it’s not actually being simulated.