Nathan Helm-Burger comments on Why Don’t We Just… Shoggoth+Face+Paraphraser?

Nathan Helm-Burger 20 Nov 2024 0:27 UTC
4 points
2
What if the CoT was hidden by default, but ‘power users’ could get access to it? That might get you some protection from busybodies complaining about factually-accurate-but-rude content in the CoT, while still giving the benefits of having thoughtful critics examining the CoT for systematic flaws.
- Noosphere89 20 Nov 2024 1:13 UTC
  4 points
  1
  Parent
  This might actually be a useful idea, thanks for your idea.
  - Daniel Kokotajlo 20 Nov 2024 1:51 UTC
    11 points
    0
    Parent
    My version of this would be something like: Users can pay $$ to view CoT (but no more than e.g. 10 per day), and everyone gets one free view-the-CoT coupon per day.
    
    Also, ‘approved users’ such as external evals organizations/auditors should be of course allowed to see all the CoTs.
    
    The main problem a thing like this needs to solve is the threat of bad actors scraping a bunch of CoT data and then using it to train their own powerful models. So, my proposal here would make that difficult.
- Seth Herd 20 Nov 2024 1:23 UTC
  3 points
  0
  Parent
  This might work. Let’s remember the financial incentives. Exposing a non-aligned CoT to all users is pretty likely to generate lots of articles about how your AI is super creepy, which will create a public perception that your AI in particular is not trustworthy relative to your competition.
  
  I agree that it would be better to expose from an alignment perspective, I’m just noting the incentives on AI companies.
  - Nathan Helm-Burger 20 Nov 2024 4:24 UTC
    5 points
    0
    Parent
    Hah, true. I wasn’t thinking about the commercial incentives! Yeah, there’s a lot of temptation to make a corpo-clean safety-washed fence-sitting sycophant. As much as Elon annoys me these days, I have to give the Grok team credit for avoiding the worst of the mealy-mouthed corporate trend.