Hello. I noticed that your proposal for achieving truth in LLMs involves using debate as a method. My concern with this approach is that an AI consists of many small components that aggregate to produce text or outputs. These components simply operate based on what they’ve learned. Therefore, the idea of clarifying or “deconfusing” these components in the service of truth through debate seems not possible to me. But if have misunderstood the concept, let me know too thanks!
Thanks for the interest! I’m not really sure what you mean, though. By components, do you mean circuits or shards or...? I’m not sure what you mean by clarifying or deconfusing components, this sounds like interpretability, but there’s not much interpretability going on in the linked project. Feel free to elaborate, though, and I’ll try to respond again.
Hello there! What I meant as components in my comment are like the attention mechanism itself. For reference, here are the mean weights of two models I’m studying.
Hello. I noticed that your proposal for achieving truth in LLMs involves using debate as a method. My concern with this approach is that an AI consists of many small components that aggregate to produce text or outputs. These components simply operate based on what they’ve learned. Therefore, the idea of clarifying or “deconfusing” these components in the service of truth through debate seems not possible to me. But if have misunderstood the concept, let me know too thanks!
Thanks for the interest! I’m not really sure what you mean, though. By components, do you mean circuits or shards or...? I’m not sure what you mean by clarifying or deconfusing components, this sounds like interpretability, but there’s not much interpretability going on in the linked project. Feel free to elaborate, though, and I’ll try to respond again.
Hello there! What I meant as components in my comment are like the attention mechanism itself. For reference, here are the mean weights of two models I’m studying.