What are some substantial critiques of the agent foundations research agenda?
Where by agent foundations I am referrring the area of research referred to by Critch in this post, which I understand as developing concepts and theoretical solutions for idealized problems related to AI safety such as logical induction.
Some that come to mind (note: I work at MIRI):
2016: Open Philanthropy Project, Anonymized Reviews of Three Recent Papers from MIRI’s Agent Foundations Research Agenda (separate reply from Nate Soares, and comments by Eliezer Yudkowsky)
2017: Daniel Dewey, My current thoughts on MIRI’s “highly reliable agent design” work (replies from Nate Soares in the comments)
2018: Richard Ngo, Realism about rationality
2018: Wolfgang Schwarz, On Functional Decision Theory
2019: Will MacAskill, A Critique of Functional Decision Theory (replies from Abram Demski in the comments)
I’d also include arguments of the form ‘we don’t need to solve agent foundations problems, because we can achieve good outcomes from AI via alternative method X and it’s easier to just do X’. E.g., (from 2015) Paul Christiano’s Abstract Approval-Direction.
Also, some overviews that aren’t trying to argue against agent foundations may still provide useful maps of where people disagree (though I don’t think e.g. Nate would 100% endorse any of these), like:
2016: Jessica Taylor, My current take on the Paul-MIRI disagreement on alignability of messy AI
2017: Jessica Taylor, On motivations for MIRI’s highly reliable agent design research
2020: Issa Rice, Plausible cases for HRAD work, and locating the crux in the “realism about rationality” debate
Nostalgebraist (2019) sees it as equivalent to solving large parts of philosophy: a noble but quixotic quest. (He also argues against short timelines but that’s tangential here.)
Stretching the definition of ‘substantial’ further:
Beth Zero was an ML researcher and Sneerclubber with some things to say. Her blog is down unfortunately but here’s her collection of critical people. Here’s a flavour of her thoughtful Bulverism. Her post on the uselessness of Solomonoff induction and the dishonesty of pushing it as an answer outside of philosophy was pretty good.
Sadly most of it is against foom, against short timelines, against longtermism, rather than anything specific about the Garrabrant or Demski or Kosoy programmes.