What if you had some computation that could be interpreted as either a simulation full of happy people, or a simulation full of depressed people?
i suspect that those two kinds of computation in fact have a profoundly different shape, such that you can’t have something that can convert into either in a simple manner. if i am wrong about this, then alignment is harder than i thought, and i don’t know what to think about encrypted computations in such a situation — i guess nuke them just to be safe?
Also, what is the difference between normal computation and encrypted computation?
we can figure out some parts of normal programs, and sometimes possibly even large parts. whereas, an encrypted computation should be guaranteed to be un-figureout-able without exponential compute. the same way i could figure out some of the meaning of a large text that’s been translated into dutch, but i’d likely be completely unable to figure out the meaning of a large text that’s been encrypted through say AES.
i suspect that those two kinds of computation in fact have a profoundly different shape, such that you can’t have something that can convert into either in a simple manner. if i am wrong about this, then alignment is harder than i thought, and i don’t know what to think about encrypted computations in such a situation — i guess nuke them just to be safe?
we can figure out some parts of normal programs, and sometimes possibly even large parts. whereas, an encrypted computation should be guaranteed to be un-figureout-able without exponential compute. the same way i could figure out some of the meaning of a large text that’s been translated into dutch, but i’d likely be completely unable to figure out the meaning of a large text that’s been encrypted through say AES.