Daniel Filan’s bottle cap example was featured prominently in “Risks from Learned Optimization” for good reason. I think it is a really clear and useful example of why you might want to care about the internals of an optimization algorithm and not just its behavior, and helped motivate that framing in the “Risks from Learned Optimization” paper.
Note that Abram Demski deserves a large part of the credit for that specific example (somewhere between ‘half’ and ‘all’), as noted in the final sentence of the post.
Daniel Filan’s bottle cap example was featured prominently in “Risks from Learned Optimization” for good reason. I think it is a really clear and useful example of why you might want to care about the internals of an optimization algorithm and not just its behavior, and helped motivate that framing in the “Risks from Learned Optimization” paper.
Note that Abram Demski deserves a large part of the credit for that specific example (somewhere between ‘half’ and ‘all’), as noted in the final sentence of the post.
A reminder, since this looks like it has a few upvotes from AF users: posts need 2 nominations to proceed to the review round.