A very promising non-mainstream AI alignment agenda.
Learning-Theoretic Agenda (LTA) attempts to combine empirical and theoretical data, which is a step in the right direction as it avoids a lot of “we don’t understand how this thing works, so no amount of empirical data can make it safe” concerns.
I’d like to see more work in the future integrating LTA with other alignment agendas, such as scalable oversight or Redwood’s AI control.
A very promising non-mainstream AI alignment agenda.
Learning-Theoretic Agenda (LTA) attempts to combine empirical and theoretical data, which is a step in the right direction as it avoids a lot of “we don’t understand how this thing works, so no amount of empirical data can make it safe” concerns.
I’d like to see more work in the future integrating LTA with other alignment agendas, such as scalable oversight or Redwood’s AI control.