Thanks for taking the time to write out your response. I think the last point you made gets at the heart of our difference in perspectives.
You could hope for substantial coordination to wait for bigger models that you only use via CPM, but I think bigger models are much riskier than well elicited small models so this seems to just make the situation worse putting aside coordination feasibility.
If we’re looking at current LLMs and asking whether conditioning provides an advantage in safely eliciting useful information, then for the most part I agree with your critiques. I also agree that bigger models are much riskier, but I have the expectation that we’re going to get them anyway. With those more powerful models come new potential issues, like predicting manipulated observations and performative prediction, that we don’t see in current systems. Strategies like RLHF also become riskier, as deceptive alignment becomes more of a live possibility with greater capabilities.
My motivation for this approach is in raising awareness and addressing the risks that seem likely to arise in future predictive models, regardless of the ends to which they’re used. Then, success in avoiding the dangers from powerful predictive models would open the possibility of using them to reduce all-cause existential risk.
I also agree that bigger models are much riskier, but I have the expectation that we’re going to get them anyway
I think I was a bit unclear. Suppose that by default GPT-6 if maximally elicited would be transformatively useful (e.g. capable of speeding up AI safety R&D by 10x). Then I’m saying CPM would require coordinating to not use these models and instead wait for GPT-8 to hit this same level of transformative usefulness. But GPT-8 is actually much riskier via being much smarter.
Thanks for taking the time to write out your response. I think the last point you made gets at the heart of our difference in perspectives.
If we’re looking at current LLMs and asking whether conditioning provides an advantage in safely eliciting useful information, then for the most part I agree with your critiques. I also agree that bigger models are much riskier, but I have the expectation that we’re going to get them anyway. With those more powerful models come new potential issues, like predicting manipulated observations and performative prediction, that we don’t see in current systems. Strategies like RLHF also become riskier, as deceptive alignment becomes more of a live possibility with greater capabilities.
My motivation for this approach is in raising awareness and addressing the risks that seem likely to arise in future predictive models, regardless of the ends to which they’re used. Then, success in avoiding the dangers from powerful predictive models would open the possibility of using them to reduce all-cause existential risk.
I think I was a bit unclear. Suppose that by default GPT-6 if maximally elicited would be transformatively useful (e.g. capable of speeding up AI safety R&D by 10x). Then I’m saying CPM would require coordinating to not use these models and instead wait for GPT-8 to hit this same level of transformative usefulness. But GPT-8 is actually much riskier via being much smarter.
(I also edited my comment to improve clarity.)