Noosphere89 comments on Why Don’t We Just… Shoggoth+Face+Paraphraser?

Noosphere89 20 Nov 2024 0:08 UTC
6 points
3
That’s actually a pretty good argument, and I actually basically agree that hiding CoT from the users is a bad choice from an alignment perspective now.
- Nathan Helm-Burger 20 Nov 2024 0:27 UTC
  4 points
  2
  Parent
  What if the CoT was hidden by default, but ‘power users’ could get access to it? That might get you some protection from busybodies complaining about factually-accurate-but-rude content in the CoT, while still giving the benefits of having thoughtful critics examining the CoT for systematic flaws.
  - Noosphere89 20 Nov 2024 1:13 UTC
    4 points
    1
    Parent
    This might actually be a useful idea, thanks for your idea.
    - Daniel Kokotajlo 20 Nov 2024 1:51 UTC
      11 points
      0
      Parent
      My version of this would be something like: Users can pay $$ to view CoT (but no more than e.g. 10 per day), and everyone gets one free view-the-CoT coupon per day.
      
      Also, ‘approved users’ such as external evals organizations/auditors should be of course allowed to see all the CoTs.
      
      The main problem a thing like this needs to solve is the threat of bad actors scraping a bunch of CoT data and then using it to train their own powerful models. So, my proposal here would make that difficult.
  - Seth Herd 20 Nov 2024 1:23 UTC
    3 points
    0
    Parent
    This might work. Let’s remember the financial incentives. Exposing a non-aligned CoT to all users is pretty likely to generate lots of articles about how your AI is super creepy, which will create a public perception that your AI in particular is not trustworthy relative to your competition.
    
    I agree that it would be better to expose from an alignment perspective, I’m just noting the incentives on AI companies.
    - Nathan Helm-Burger 20 Nov 2024 4:24 UTC
      5 points
      0
      Parent
      Hah, true. I wasn’t thinking about the commercial incentives! Yeah, there’s a lot of temptation to make a corpo-clean safety-washed fence-sitting sycophant. As much as Elon annoys me these days, I have to give the Grok team credit for avoiding the worst of the mealy-mouthed corporate trend.