Would this work for models that account for larger levels of context? Or would something like instrumental convergence lead to similar misalignments being very common at sufficiently strong levels of AGI?
If they are oracles, we can provide them the same context. The idea is actually used in spaceflight there in some spacecraft there was three computers running parallely. If one of them differs from two other, its output was ignored.
Strong AGIs can do some tricks like acausally cooperate to give similar mis-aligned answers. They can also generalize to “let me out of the box”. So this trick alone can’t ensure safety of advance AIs.
Also, CEV is some unexpected for us generalisation of many our goals.
Would this work for models that account for larger levels of context? Or would something like instrumental convergence lead to similar misalignments being very common at sufficiently strong levels of AGI?
If they are oracles, we can provide them the same context. The idea is actually used in spaceflight there in some spacecraft there was three computers running parallely. If one of them differs from two other, its output was ignored.
Strong AGIs can do some tricks like acausally cooperate to give similar mis-aligned answers. They can also generalize to “let me out of the box”. So this trick alone can’t ensure safety of advance AIs.
Also, CEV is some unexpected for us generalisation of many our goals.