One of my ideas to align AI is to actually intentionally using Pascal’s Mugging to keep it in line. Although instead of just hoping and praying, I’ve been thinking about ways to try to push it that direction. For example, multiple layers of networks with honeypots might help make an AI doubt that it’s truly at the outermost level. Alternatively, we could try to find an intervention that would directly increase its belief that it is in a simulation (possibly with side-effects, like effecting a bunch of beliefs as well).
If you think this approach is promising, I’d encourage you to think more about it as I don’t know how deeply people have delved into these kinds of options.
One of my ideas to align AI is to actually intentionally using Pascal’s Mugging to keep it in line. Although instead of just hoping and praying, I’ve been thinking about ways to try to push it that direction. For example, multiple layers of networks with honeypots might help make an AI doubt that it’s truly at the outermost level. Alternatively, we could try to find an intervention that would directly increase its belief that it is in a simulation (possibly with side-effects, like effecting a bunch of beliefs as well).
If you think this approach is promising, I’d encourage you to think more about it as I don’t know how deeply people have delved into these kinds of options.