I’m interested at all in Redwood Research’s latest project because it seems to offer a prospect of wandering around with our eyes open asking questions like “Well, what if we try to apply this nonviolence predicate OOD, can we figure out what really went into the ‘nonviolence’ predicate instead of just nonviolence?” or if it works maybe we can try training on corrigibility and see if we can start to manifest the tiniest bit of the predictable breakdowns, which might manifest in some different way.
Trying to rephrase it in my own words (which will necessarily lose some details), are you interested in Redwood’s research because it might plausibly generate alignment issues and problems that are analogous to the real problem within the safer regime and technology we have now? Which might tell us for example “what aspect of these predictable problems crop up first, and why?”
are you interested in Redwood’s research because it might plausibly generate alignment issues and problems that are analogous to the real problem within the safer regime and technology we have now?
It potentially sheds light on small subpieces of things that are particular aspects that contribute to the Real Problem, like “What actually went into the nonviolence predicate instead of just nonviolence?” Much of the Real Meta-Problem is that you do not get things analogous to the full Real Problem until you are just about ready to die.
Trying to rephrase it in my own words (which will necessarily lose some details), are you interested in Redwood’s research because it might plausibly generate alignment issues and problems that are analogous to the real problem within the safer regime and technology we have now? Which might tell us for example “what aspect of these predictable problems crop up first, and why?”
It potentially sheds light on small subpieces of things that are particular aspects that contribute to the Real Problem, like “What actually went into the nonviolence predicate instead of just nonviolence?” Much of the Real Meta-Problem is that you do not get things analogous to the full Real Problem until you are just about ready to die.