Frankly, I’d love to see some bright young aspiring alignment researcher take this topic on as a research project, from either a mathematical or a more logical/rhetorical/experimental viewpoint — and would be delighted to consult on such a project. Unfortunately I have a day job (currently in AI but not AI alignment/safety/interpretability research, I’m working on that) and don’t have access to resources like a university debating society that I could talk into helping me with this, but if there was anyone who did and was interested, I’d love to discuss it and help out however I can.
Frankly, I’d love to see some bright young aspiring alignment researcher take this topic on as a research project, from either a mathematical or a more logical/rhetorical/experimental viewpoint — and would be delighted to consult on such a project. Unfortunately I have a day job (currently in AI but not AI alignment/safety/interpretability research, I’m working on that) and don’t have access to resources like a university debating society that I could talk into helping me with this, but if there was anyone who did and was interested, I’d love to discuss it and help out however I can.