I think (1b) doesn’t go through. The “starting data” we have from (1a) is that the AGI has some preferences over lotteries that it competently acts on—acyclicality seems likely but we don’t get completeness or transitivity for free, so we can’t assume its preferences will be representable as maximising some utility function. (I suppose we also have the constraint that its preferences look “locally” good to us given training). But if this is all we have it doesn’t follow that the agent will have some coherent goal it’d be want optimisers optimising towards
An AGI doesn’t have to be an EU-maximiser to be scary—it could have e.g. incomplete preferences but still prefer B to A where we really really prefer A to B. But I think assuming an AI will look like an EU-maximiser does a lot of the heavy-lifting in guaranteeing the AGI will be lethal, since otherwise we can’t a priori predict it’ll want to optimise along any dimension particularly hard
I think (1b) doesn’t go through. The “starting data” we have from (1a) is that the AGI has some preferences over lotteries that it competently acts on—acyclicality seems likely but we don’t get completeness or transitivity for free, so we can’t assume its preferences will be representable as maximising some utility function. (I suppose we also have the constraint that its preferences look “locally” good to us given training). But if this is all we have it doesn’t follow that the agent will have some coherent goal it’d be want optimisers optimising towards
An AGI doesn’t have to be an EU-maximiser to be scary—it could have e.g. incomplete preferences but still prefer B to A where we really really prefer A to B. But I think assuming an AI will look like an EU-maximiser does a lot of the heavy-lifting in guaranteeing the AGI will be lethal, since otherwise we can’t a priori predict it’ll want to optimise along any dimension particularly hard