Jan Matusiewicz comments on The (partial) fallacy of dumb superintelligence

Jan Matusiewicz 4 Nov 2023 17:42 UTC
4 points
0
How about using Yoshua Bengio’s AI scientist (https://yoshuabengio.org/2023/05/07/ai-scientists-safe-and-useful-ai/) for alignment? The idea for AI scientist is to train AI to just understand the world (quite like LLMs do) but don’t do any human values alignment. AI scientist just sincerely answers questions but doesn’t consider any implications of providing the answer or whether humans like the answer or not. It doesn’t have any goals.
When user asks the main autonomous system to produce detailed plan to achieve the given goal—this plan may be too complicated to be understood for human. Human may not spot potential hidden agenda. But AI scientist may be used to look at the plan and answer the questions about potential implications—can it be illegal, controversial, harm any humans, etc. Wouldn’t that prevent rogue AGI scenario?
- Seth Herd 4 Nov 2023 19:46 UTC
  4 points
  1
  Parent
  It probably would. But convincing the entire world to do that instead, and not build agentic AGI, sounds very questionable. That’s why I’m looking for alignment strategies with low alignment tax, for the types of AGI likely to be the first ones built.
  - RogerDearnaley 5 Dec 2023 10:11 UTC
    1 point
    0
    Parent
    Several of the superscalers have public plans of the form: Step 1) build an AI scientist, or at least research assistant 2) point it at the Aligment Problem 3) check it output until the Alignment Problem is solved 4) Profit!
    This is basically the same proposal as Value Leaning, just done as a team effort.