Daniel Kokotajlo answers Are there specific books that it might slightly help alignment to have on the internet?

Daniel Kokotajlo 30 Mar 2023 4:01 UTC
10 points
0
Evidential Cooperation in Large Worlds, Immanuel Kant and the Decision Theory App Store, lots of decision theory stuff about Twin PD, etc. OK I guess these don’t really help with alignment narrowly construed as human values or obeying human intent. But they help make the AI more rational in ways that reduce the probability of certain terrible outcomes.