I feel like this connects with what Max Tegmark was talking about in his recent 80k hours podcast interview. The idea that we need hierarchical alignment across groups of humans (companies, governments, sets of humans + the programs they’ve written plus their ML models?) as well as just within AI systems. I think if you carefully design an experiment which would generalize from humans to AGI systems, you could potentially learn some valuable lessons.
I feel like this connects with what Max Tegmark was talking about in his recent 80k hours podcast interview. The idea that we need hierarchical alignment across groups of humans (companies, governments, sets of humans + the programs they’ve written plus their ML models?) as well as just within AI systems. I think if you carefully design an experiment which would generalize from humans to AGI systems, you could potentially learn some valuable lessons.