Evidential Cooperation in Large Worlds, Immanuel Kant and the Decision Theory App Store, lots of decision theory stuff about Twin PD, etc. OK I guess these don’t really help with alignment narrowly construed as human values or obeying human intent. But they help make the AI more rational in ways that reduce the probability of certain terrible outcomes.
Evidential Cooperation in Large Worlds, Immanuel Kant and the Decision Theory App Store, lots of decision theory stuff about Twin PD, etc. OK I guess these don’t really help with alignment narrowly construed as human values or obeying human intent. But they help make the AI more rational in ways that reduce the probability of certain terrible outcomes.