What work is step #1 doing here? It seems like steps #2-5 would still hold even if the AGI in question were using “bad” consequentialist reasoning (e.g. domain-limited/high-K/exploitable/etc.).
In fact, is it necessary to assume that the AGI will be consequentialist at all? It seems highly probable that the first pivotal act will be taken by a system of humans+AI that is collectively behaving in a consequentialist fashion (in order to pick out a pivotal act from the set of all actions). If so, do arguments #2-#5 not apply equally well to this system as a whole, with “top-level” interpreted as something like “transparent to humans within the system”?
What work is step #1 doing here? It seems like steps #2-5 would still hold even if the AGI in question were using “bad” consequentialist reasoning (e.g. domain-limited/high-K/exploitable/etc.).
In fact, is it necessary to assume that the AGI will be consequentialist at all? It seems highly probable that the first pivotal act will be taken by a system of humans+AI that is collectively behaving in a consequentialist fashion (in order to pick out a pivotal act from the set of all actions). If so, do arguments #2-#5 not apply equally well to this system as a whole, with “top-level” interpreted as something like “transparent to humans within the system”?