the first line seems to speculate that values-AGI is substantially more robust to differences in values
The thing that I believe is that an intelligent, reflective, careful agent with a decisive strategic advantage (DSA) will tend to produce outcomes that are similar in value to that which would be done by that agent’s CEV. In particular, I believe this because the agent is “trying” to do what its CEV would do, it has the power to do what its CEV would do, and so it will likely succeed at this.
I don’t know what you mean by “values-AGI is more robust to differences in values”. What values are different in this hypothetical?
I do think that values-AGI with a DSA is likely to produce outcomes similar to CEV-of-values-AGI
It is unclear whether values-AGI with a DSA is going to produce outcomes similar to CEV-of-Rohin (because this depends on how you built values-AGI and whether you successfully aligned it).
The thing that I believe is that an intelligent, reflective, careful agent with a decisive strategic advantage (DSA) will tend to produce outcomes that are similar in value to that which would be done by that agent’s CEV. In particular, I believe this because the agent is “trying” to do what its CEV would do, it has the power to do what its CEV would do, and so it will likely succeed at this.
I don’t know what you mean by “values-AGI is more robust to differences in values”. What values are different in this hypothetical?
I do think that values-AGI with a DSA is likely to produce outcomes similar to CEV-of-values-AGI
It is unclear whether values-AGI with a DSA is going to produce outcomes similar to CEV-of-Rohin (because this depends on how you built values-AGI and whether you successfully aligned it).