MATS’ framing is that we are supporting a “diverse portfolio” of research agendas that might “pay off” in different worlds (i.e., your “hedging bets” analogy is accurate). We also think the listed research agendas have some synergy you might have missed. For example, interpretability research might build into better AI-assisted white-box auditing, white/gray-box steering (e.g., via ELK), or safe architecture design (e.g., “retargeting the search”).
The distinction between “evaluator” and “generator” seems fuzzier to me than you portray. For instance, two “generator” AIs might be able to red-team each other for the purposes of evaluating an alignment strategy.
MATS’ framing is that we are supporting a “diverse portfolio” of research agendas that might “pay off” in different worlds (i.e., your “hedging bets” analogy is accurate). We also think the listed research agendas have some synergy you might have missed. For example, interpretability research might build into better AI-assisted white-box auditing, white/gray-box steering (e.g., via ELK), or safe architecture design (e.g., “retargeting the search”).
The distinction between “evaluator” and “generator” seems fuzzier to me than you portray. For instance, two “generator” AIs might be able to red-team each other for the purposes of evaluating an alignment strategy.