What if the CoT was hidden by default, but ‘power users’ could get access to it? That might get you some protection from busybodies complaining about factually-accurate-but-rude content in the CoT, while still giving the benefits of having thoughtful critics examining the CoT for systematic flaws.
My version of this would be something like: Users can pay $$ to view CoT (but no more than e.g. 10 per day), and everyone gets one free view-the-CoT coupon per day.
Also, ‘approved users’ such as external evals organizations/auditors should be of course allowed to see all the CoTs.
The main problem a thing like this needs to solve is the threat of bad actors scraping a bunch of CoT data and then using it to train their own powerful models. So, my proposal here would make that difficult.
This might work. Let’s remember the financial incentives. Exposing a non-aligned CoT to all users is pretty likely to generate lots of articles about how your AI is super creepy, which will create a public perception that your AI in particular is not trustworthy relative to your competition.
I agree that it would be better to expose from an alignment perspective, I’m just noting the incentives on AI companies.
Hah, true. I wasn’t thinking about the commercial incentives! Yeah, there’s a lot of temptation to make a corpo-clean safety-washed fence-sitting sycophant. As much as Elon annoys me these days, I have to give the Grok team credit for avoiding the worst of the mealy-mouthed corporate trend.
What if the CoT was hidden by default, but ‘power users’ could get access to it? That might get you some protection from busybodies complaining about factually-accurate-but-rude content in the CoT, while still giving the benefits of having thoughtful critics examining the CoT for systematic flaws.
This might actually be a useful idea, thanks for your idea.
My version of this would be something like: Users can pay $$ to view CoT (but no more than e.g. 10 per day), and everyone gets one free view-the-CoT coupon per day.
Also, ‘approved users’ such as external evals organizations/auditors should be of course allowed to see all the CoTs.
The main problem a thing like this needs to solve is the threat of bad actors scraping a bunch of CoT data and then using it to train their own powerful models. So, my proposal here would make that difficult.
This might work. Let’s remember the financial incentives. Exposing a non-aligned CoT to all users is pretty likely to generate lots of articles about how your AI is super creepy, which will create a public perception that your AI in particular is not trustworthy relative to your competition.
I agree that it would be better to expose from an alignment perspective, I’m just noting the incentives on AI companies.
Hah, true. I wasn’t thinking about the commercial incentives! Yeah, there’s a lot of temptation to make a corpo-clean safety-washed fence-sitting sycophant. As much as Elon annoys me these days, I have to give the Grok team credit for avoiding the worst of the mealy-mouthed corporate trend.