Attainable Utility theory describes how people feel impactedAttainable Utility theory describes how people feel impactedAgents trained by powerful RL algorithms on arbitrary reward signals generally try to take over the world.Agents trained by powerful RL algorithms on arbitrary reward signals generally try to take over the world.The catastrophic convergence conjecture is true. That is, unaligned goals tend to have catastrophe-inducing optimal policies because of power-seeking incentives.The catastrophic convergence conjecture is true. That is, unaligned goals tend to have catastrophe-inducing optimal policies because of power-seeking incentives.AUP_conceptual prevents catastrophe, assuming the catastrophic convergence conjecture. AUP_conceptual prevents catastrophe, assuming the catastrophic convergence conjecture. Some version of Attainable Utility Preservation solves side effect problems for an extremely wide class of real-world tasks and for subhuman agents.Some version of Attainable Utility Preservation solves side effect problems for an extremely wide class of real-world tasks and for subhuman agents.For the superhuman case, penalizing the agent for increasing its own Attainable Utility (AU) is better than penalizing the agent for increasing other AUs. For the superhuman case, penalizing the agent for increasing its own Attainable Utility (AU) is better than penalizing the agent for increasing other AUs. There exists a simple closed-form solution to catastrophe avoidance (in the outer alignment sense).There exists a simple closed-form solution to catastrophe avoidance (in the outer alignment sense).
Here are prediction questions for the predictions that TurnTrout himself provided in the concluding post of the Reframing Impact sequence.