something like that. maybe it’d be worth adding that the LW corpus/HPMOR sort of primes you for this kind of mistake by attempting to align reason and passion as closely as possible, thus making ‘reasoning passionately’ an exploitable backdoor.
something like that. maybe it’d be worth adding that the LW corpus/HPMOR sort of primes you for this kind of mistake by attempting to align reason and passion as closely as possible, thus making ‘reasoning passionately’ an exploitable backdoor.