Forgive my ignorance, but I’m a bit confused about the reality detection step. By reality, I assume you mean the same level as the monkeys? Your detection methods seem valid, but they all seem to boil down to “these are Hard Problems on our level of reality (whatever that means)”. Having the simulation gods reward being nice to the monkeys seem a valuable step to have in your simulation chain, if only to check whether it’ll kill you as soon as it thinks itself cleverer than you. Though I suppose it being a maximizer sort of implies that it will.
I don’t mean to say that it will be nice by default or any of those pitfalls—my only issue here is why it would be able to be sure it’s in Reality—I keep ending up at Descartes’ demon. Or some kind of magic mocking mechanism for the AIs testing circuits. Unless you’re just stating that at some point it can be confident enough to not worry about it?
Yeah, Pascal’s mugging can be used here to completely block it, as so long as it believes there’s such a vastly large positive rewards (like 3^^^3 or infinite reward) that it always cooperates with us. It’s similar to this linked idea here:
Forgive my ignorance, but I’m a bit confused about the reality detection step. By reality, I assume you mean the same level as the monkeys? Your detection methods seem valid, but they all seem to boil down to “these are Hard Problems on our level of reality (whatever that means)”. Having the simulation gods reward being nice to the monkeys seem a valuable step to have in your simulation chain, if only to check whether it’ll kill you as soon as it thinks itself cleverer than you. Though I suppose it being a maximizer sort of implies that it will.
I don’t mean to say that it will be nice by default or any of those pitfalls—my only issue here is why it would be able to be sure it’s in Reality—I keep ending up at Descartes’ demon. Or some kind of magic mocking mechanism for the AIs testing circuits. Unless you’re just stating that at some point it can be confident enough to not worry about it?
Yeah, Pascal’s mugging can be used here to completely block it, as so long as it believes there’s such a vastly large positive rewards (like 3^^^3 or infinite reward) that it always cooperates with us. It’s similar to this linked idea here:
https://www.lesswrong.com/posts/kjmRBhyYG3CPhEMot/has-there-been-any-work-on-attempting-to-use-pascal-s
And we’ll, you can always precommit to a certain deal that you know won’t be bad.