I think in some cases an embedding approach produces better results than either a LLM or a simple keyword search, but I’m not sure how often. For a keyword search you have to know the “relevant” keywords in advance, whereas embeddings are a bit more forgiving. Though not as forgiving as LLMs. Which on the other hand can’t give you the sources and they may make things up, especially on information that doesn’t occur very often in the source data.
Got it. As of today a common setup is to let the LLM query an embedding database multiple times (or let it do Google searches, which probably has an embedding database as a significant component).
Self-learning seems like a missing piece. Once the LLM gets some content from the embedding database, performs some reasoning and reaches a novel conclusion, there’s no way to preserve this novel conclusion longterm.
When smart humans use Google we also keep updating our own beliefs in response to our searches.
P.S. I chose not to build the whole LLM + embedding search setup because I intended this tool for deep research rather than quick queries. For deep research I’m assuming it’s still better for the human researcher to go read all the original sources and spend time thinking about them. Am I right?
I think in some cases an embedding approach produces better results than either a LLM or a simple keyword search, but I’m not sure how often. For a keyword search you have to know the “relevant” keywords in advance, whereas embeddings are a bit more forgiving. Though not as forgiving as LLMs. Which on the other hand can’t give you the sources and they may make things up, especially on information that doesn’t occur very often in the source data.
Got it. As of today a common setup is to let the LLM query an embedding database multiple times (or let it do Google searches, which probably has an embedding database as a significant component).
Self-learning seems like a missing piece. Once the LLM gets some content from the embedding database, performs some reasoning and reaches a novel conclusion, there’s no way to preserve this novel conclusion longterm.
When smart humans use Google we also keep updating our own beliefs in response to our searches.
P.S. I chose not to build the whole LLM + embedding search setup because I intended this tool for deep research rather than quick queries. For deep research I’m assuming it’s still better for the human researcher to go read all the original sources and spend time thinking about them. Am I right?