because we’ll be operating at levels of complexity where the abstractions we use to ignore this stuff can’t help but leak.
If that were the case (and it may very well be), there goes provably friendly AI, for to guarantee a property under all circumstances, it must be upheld from the bottom layer upwards.
I think it’s possible that any leaky abstraction used in designing FAI might doom the enterprise. But if that’s not true, we can use this “qualia translation function” to make a leaky abstractions in a FAI context a tiny bit safer(?).
E.g., if we’re designing an AGI with a reward signal, my intuition is we should either
(1) align our reward signal with actual pleasurable qualia (so if our abstractions leak it matters less, since the AGI is drawn to maximize what we want it to maximize anyway);
(2) implement the AGI in an architecture/substrate which produces as little emotional qualia as possible, so there’s little incentive for behavior to drift.
My thoughts here are terribly laden with assumptions and could be complete crap. Just thinking out loud.
If that were the case (and it may very well be), there goes provably friendly AI, for to guarantee a property under all circumstances, it must be upheld from the bottom layer upwards.
I think it’s possible that any leaky abstraction used in designing FAI might doom the enterprise. But if that’s not true, we can use this “qualia translation function” to make a leaky abstractions in a FAI context a tiny bit safer(?).
E.g., if we’re designing an AGI with a reward signal, my intuition is we should either (1) align our reward signal with actual pleasurable qualia (so if our abstractions leak it matters less, since the AGI is drawn to maximize what we want it to maximize anyway); (2) implement the AGI in an architecture/substrate which produces as little emotional qualia as possible, so there’s little incentive for behavior to drift.
My thoughts here are terribly laden with assumptions and could be complete crap. Just thinking out loud.