Is this research into ‘actual reasoning’ that you’re deliberately being light on details about something that is out in the public (e.g. on arxiv), or is this something you’ve witnessed privately and anticipate will become public in the near future?
Most of it is the latter, but to be clear, I do not have inside information about what any large organization is doing privately, nor have I seen an “oh no we’re doomed” proof of concept. Just some very obvious “yup that’ll work” stuff. I expect adjacent things to be published at some point soonishly just because the ideas are so simple and easily found/implemented independently. Someone might have already and I’m just not aware of it. I just don’t want to be the one to oops and push on the wrong side of the capability-safety balance.
Is this research into ‘actual reasoning’ that you’re deliberately being light on details about something that is out in the public (e.g. on arxiv), or is this something you’ve witnessed privately and anticipate will become public in the near future?
Here is a paper from January 2022 on arXiv that details the sort of generalization-hop we’re seeing models doing.
Most of it is the latter, but to be clear, I do not have inside information about what any large organization is doing privately, nor have I seen an “oh no we’re doomed” proof of concept. Just some very obvious “yup that’ll work” stuff. I expect adjacent things to be published at some point soonishly just because the ideas are so simple and easily found/implemented independently. Someone might have already and I’m just not aware of it. I just don’t want to be the one to oops and push on the wrong side of the capability-safety balance.