I’ve been pondering a view of alignment in the frame of intelligence ratios—humans with capability N0 can produce aligned agents with capability N1 where N1=k∗N0 for some k[1], and alignment techniques might increase k.
Has this already been discussed somewhere, and would it be worth spending time to think this out and write it down?
It hasn’t been discussed to my knowledge, and I think that unless you’re doing something much more important (or you’re easily discouraged by people telling you that you’ve more to learn) it’s pretty much always worth spending time thinking things out and writing them down.
I’m not very familiar with the AI safety canon.
I’ve been pondering a view of alignment in the frame of intelligence ratios—humans with capability N0 can produce aligned agents with capability N1 where N1=k∗N0 for some k[1], and alignment techniques might increase k.
Has this already been discussed somewhere, and would it be worth spending time to think this out and write it down?
Or maybe some other function of N0 is more useful?
It hasn’t been discussed to my knowledge, and I think that unless you’re doing something much more important (or you’re easily discouraged by people telling you that you’ve more to learn) it’s pretty much always worth spending time thinking things out and writing them down.