As someone with 8-year median timelines to AGI, I don’t see this as obviously progress, but it frames possible applications that would be. The immediate application might be to get rid of some hallucinated factually incorrect claims, by calling tools for fact-checking with small self-contained queries and possibly self-distilling corrected text. This could bring LLM-based web search (for which this week’s announcements from Microsoft and Google are starting a race) closer to production quality. But this doesn’t seem directly AGI-relevant.
A more loose AGI-relevant application is to use this for teaching reliable use of specific reasoning/agency skills, automatically formulating “teachable moments” in the middle of any generated text. An even more vague inspiration from the paper is replacing bureaucracies (which is the hypothetical approach to teaching reasoning/agency skills when not already having them reliably available) with trees of (chained) tool calls, obviating the need to explicitly set up chatrooms where multiple specialized chatbots discuss a problem with each other at length to make better progress than possible to do immediately/directly.
Is an 8-year median considered long or short or about average? I’m specifically asking in relation to the opinion of people who pay attention to AGI capabilities and are aware of the alignment problem. I’m just hoping you can give me an idea of what is considered “normal” among AGI/ alignment people in regards to AGI timelines.
As someone with 8-year median timelines to AGI, I don’t see this as obviously progress, but it frames possible applications that would be. The immediate application might be to get rid of some hallucinated factually incorrect claims, by calling tools for fact-checking with small self-contained queries and possibly self-distilling corrected text. This could bring LLM-based web search (for which this week’s announcements from Microsoft and Google are starting a race) closer to production quality. But this doesn’t seem directly AGI-relevant.
A more loose AGI-relevant application is to use this for teaching reliable use of specific reasoning/agency skills, automatically formulating “teachable moments” in the middle of any generated text. An even more vague inspiration from the paper is replacing bureaucracies (which is the hypothetical approach to teaching reasoning/agency skills when not already having them reliably available) with trees of (chained) tool calls, obviating the need to explicitly set up chatrooms where multiple specialized chatbots discuss a problem with each other at length to make better progress than possible to do immediately/directly.
Is an 8-year median considered long or short or about average? I’m specifically asking in relation to the opinion of people who pay attention to AGI capabilities and are aware of the alignment problem. I’m just hoping you can give me an idea of what is considered “normal” among AGI/ alignment people in regards to AGI timelines.