I’m not interested in the strongest argument from your perspective (i.e. the steelman), but I am interested how much you think you can pass the ITT for Eliezer’s perspective on the alignment problem — what shape the problem is, why it’s hard, and how to make progress. Can you give a sense of the parts of his ITT you think you’ve got?
I think I could do pretty well (it’s plausible to me that I’m the favorite in any head-to-head match with someone who isn’t a current MIRI employee? probably not but I’m at least close). There are definitely some places I still get surprised and don’t expect to do that well, e.g. I was recently surprised by one of Eliezer’s positions regarding the relative difficulty of some kinds of reasoning tasks for near-future language models (and I expect there are similar surprises in domains that are less close to near-term predictions). I don’t really know how to split it into parts for the purpose of saying what I’ve got or not.
I’m not interested in the strongest argument from your perspective (i.e. the steelman), but I am interested how much you think you can pass the ITT for Eliezer’s perspective on the alignment problem — what shape the problem is, why it’s hard, and how to make progress. Can you give a sense of the parts of his ITT you think you’ve got?
I think I could do pretty well (it’s plausible to me that I’m the favorite in any head-to-head match with someone who isn’t a current MIRI employee? probably not but I’m at least close). There are definitely some places I still get surprised and don’t expect to do that well, e.g. I was recently surprised by one of Eliezer’s positions regarding the relative difficulty of some kinds of reasoning tasks for near-future language models (and I expect there are similar surprises in domains that are less close to near-term predictions). I don’t really know how to split it into parts for the purpose of saying what I’ve got or not.