Nitpick (probably just me overthinking/stating the obvious) on levels 8-9 (I’m “on” level 8): I’d assume the point of this alignment research pre-SLT is specifically to create techniques that aren’t broken by the SLT “obsoleting previous alignment techniques.” I also think alignment techniques of the required soundness, would also happen to work on less-intelligence systems.
This is plausibly true for some solutions this research could produce like e.g. some new method of soft optimization, but might not be in all cases.
For levels 4-6 especially the pTAI that’s capable of e.g. automating alignment research or substantially reducing the risks of unaligned TAI might lack some of the expected ‘general intelligence’ of AIs post SLT and be too unintelligent for techniques that rely on it having complete strategic awareness, self-reflection, a consistent decision theory, the ability to self improve or other post SLT characteristics.
One (unrealistic) example, if we have a technique for fully loading the human CEV into a superintelligence ready to go that works for levels 8 or 9, that may well not help at all with improving scalable oversight of non-superintelligent pTAI which is incapable of representing the full human value function.
Nitpick (probably just me overthinking/stating the obvious) on levels 8-9 (I’m “on” level 8): I’d assume the point of this alignment research pre-SLT is specifically to create techniques that aren’t broken by the SLT “obsoleting previous alignment techniques.” I also think alignment techniques of the required soundness, would also happen to work on less-intelligence systems.
This is plausibly true for some solutions this research could produce like e.g. some new method of soft optimization, but might not be in all cases.
For levels 4-6 especially the pTAI that’s capable of e.g. automating alignment research or substantially reducing the risks of unaligned TAI might lack some of the expected ‘general intelligence’ of AIs post SLT and be too unintelligent for techniques that rely on it having complete strategic awareness, self-reflection, a consistent decision theory, the ability to self improve or other post SLT characteristics.
One (unrealistic) example, if we have a technique for fully loading the human CEV into a superintelligence ready to go that works for levels 8 or 9, that may well not help at all with improving scalable oversight of non-superintelligent pTAI which is incapable of representing the full human value function.