Thanks for taking the time to write out your response. I think the last point you made gets at the heart of our difference in perspectives.
You could hope for substantial coordination to wait for bigger models that you only use via CPM, but I think bigger models are much riskier than well elicited small models so this seems to just make the situation worse putting aside coordination feasibility.
If we’re looking at current LLMs and asking whether conditioning provides an advantage in safely eliciting useful information, then for the most part I agree with your critiques. I also agree that bigger models are much riskier, but I have the expectation that we’re going to get them anyway. With those more powerful models come new potential issues, like predicting manipulated observations and performative prediction, that we don’t see in current systems. Strategies like RLHF also become riskier, as deceptive alignment becomes more of a live possibility with greater capabilities.
My motivation for this approach is in raising awareness and addressing the risks that seem likely to arise in future predictive models, regardless of the ends to which they’re used. Then, success in avoiding the dangers from powerful predictive models would open the possibility of using them to reduce all-cause existential risk.
I’d take an agnostic view on whether LLMs are doing search internally. Crucially, though, I think the relevant output to be searching over is distributions of tokens, rather than the actual token that gets chosen. Search is not required to generate a single distribution over next tokens.
I agree that external search via scaffolding can also be done, and would be much easier to identify, but without understanding the internals it’s hard to know how powerful the search process will be.