Chris_Leong comments on AGI Safety FAQ / all-dumb-questions-allowed thread

Chris_Leong 7 Jun 2022 20:02 UTC
3 points
1
One of my ideas to align AI is to actually intentionally using Pascal’s Mugging to keep it in line. Although instead of just hoping and praying, I’ve been thinking about ways to try to push it that direction. For example, multiple layers of networks with honeypots might help make an AI doubt that it’s truly at the outermost level. Alternatively, we could try to find an intervention that would directly increase its belief that it is in a simulation (possibly with side-effects, like effecting a bunch of beliefs as well).

If you think this approach is promising, I’d encourage you to think more about it as I don’t know how deeply people have delved into these kinds of options.