What if you had some computation that could be interpreted (e.g. decrypted with two different keys) as either a simulation full of happy people, or a simulation full of depressed people? I think an adequate theory of experience is able to look at the encrypted computation (or any computation) and decide directly if there is suffering happening there.
Also, what is the difference between normal computation and encrypted computation? I feel like looking at a process that you haven’t programmed yourself is not really that different than looking at an encrypted version of that. In either case, we don’t have a clue about what’s going on. And if we have a theory that lets us figure it out, it should work on both a normal and an encrypted version.
What if you had some computation that could be interpreted as either a simulation full of happy people, or a simulation full of depressed people?
i suspect that those two kinds of computation in fact have a profoundly different shape, such that you can’t have something that can convert into either in a simple manner. if i am wrong about this, then alignment is harder than i thought, and i don’t know what to think about encrypted computations in such a situation — i guess nuke them just to be safe?
Also, what is the difference between normal computation and encrypted computation?
we can figure out some parts of normal programs, and sometimes possibly even large parts. whereas, an encrypted computation should be guaranteed to be un-figureout-able without exponential compute. the same way i could figure out some of the meaning of a large text that’s been translated into dutch, but i’d likely be completely unable to figure out the meaning of a large text that’s been encrypted through say AES.
What if you had some computation that could be interpreted (e.g. decrypted with two different keys) as either a simulation full of happy people, or a simulation full of depressed people? I think an adequate theory of experience is able to look at the encrypted computation (or any computation) and decide directly if there is suffering happening there.
Also, what is the difference between normal computation and encrypted computation? I feel like looking at a process that you haven’t programmed yourself is not really that different than looking at an encrypted version of that. In either case, we don’t have a clue about what’s going on. And if we have a theory that lets us figure it out, it should work on both a normal and an encrypted version.
i suspect that those two kinds of computation in fact have a profoundly different shape, such that you can’t have something that can convert into either in a simple manner. if i am wrong about this, then alignment is harder than i thought, and i don’t know what to think about encrypted computations in such a situation — i guess nuke them just to be safe?
we can figure out some parts of normal programs, and sometimes possibly even large parts. whereas, an encrypted computation should be guaranteed to be un-figureout-able without exponential compute. the same way i could figure out some of the meaning of a large text that’s been translated into dutch, but i’d likely be completely unable to figure out the meaning of a large text that’s been encrypted through say AES.