Yeah, I’m particularly interested in scalable oversight over long-horizon tasks and chain-of-thought faithfulness. I’d probably be pretty open to a wide range of safety-relevant topics though.
In general, what gets me most excited about AI research is trying to come up with the perfect training scheme to incentivize the AI to learn what you want it to—things like HCH, Debate, and the ELK contest were really cool to me. So I’m a bit less interested in areas like mechanistic interpretability or very theoretical math
Yeah, I’m particularly interested in scalable oversight over long-horizon tasks and chain-of-thought faithfulness. I’d probably be pretty open to a wide range of safety-relevant topics though.
In general, what gets me most excited about AI research is trying to come up with the perfect training scheme to incentivize the AI to learn what you want it to—things like HCH, Debate, and the ELK contest were really cool to me. So I’m a bit less interested in areas like mechanistic interpretability or very theoretical math