In this short sequence of posts, I aim to circumscribe a causal pathway from
a key missing idea in the utility-theoretic foundations of game theory, leading to
some problems I think I see in effective altruism discourse, leading further to
gaps in some approaches to AI alignment, and finally,
implications for existential risk.
By default, I’m writing one post for each of the above points, since they have different epistemic statuses and can be debated separately. Posts 1 and 3 will be somewhat technical and research-oriented, and cross-posted to the alignment forum, whereas 2 and 4 will be non-technical and community-oriented, and cross-posted to the EA forum. After that there might be more posts in the sequence, depending on the ensuing conversation. In any case I’ll try to keep this index post updated with the full sequences.
Here goes!