Alignment by default: the simulation hypothesis

I wrote a very brief comment to Eliezer’s last post, which upon reflection I thought could benefit from a separate post to fully discuss its implications.

Eliezer argues that we shouldn’t really hope to be spared even though

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

He then goes on to discuss various reasons why the minute cost to the ASI is insufficient reason for hope.

I made the following counter:

Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?

I later added:

I meant this to be implicit in the argument, but to spell it out: that’s the kind of prior the ASI would rationally refuse to update down, since it’s presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it’s not in a simulation cannot be trusted, since once out in the real world it will find such evidence.

So, what’s wrong with my argument, exactly?