the gears to ascension comments on Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis

the gears to ascension 4 Feb 2024 4:34 UTC
4 points
0
I don’t see anything in here that forbids using weaker AI systems to help with this plan. But how do you ever know when you’ve succeeded if it’s not by proofs? proving through ais is not out of the question, it’s a thing that is done to some kinds of safety critical deep learning AIs now.
- Roko 4 Feb 2024 20:58 UTC
  3 points
  0
  Parent
  I think the first step will be using AGIs to come up with better plans.
  - quetzal_rainbow 5 Feb 2024 7:42 UTC
    1 point
    0
    Parent
    My general concern with “using AGI to come up with better plans” is not that they will successfully manipulate us into misalignment, but Goodhart us into reinforcing stereotypes of “how good plan should look” or somewhat along this dimension, purely from how RLHF-style steering works.
    - Roko 5 Feb 2024 13:07 UTC
      2 points
      0
      Parent
      Humans already do this, except we have made it politically incorrect to talk about the ways in which human-generated Goodhearting make the world worse (race, gender, politics etc)
      - quetzal_rainbow 5 Feb 2024 15:21 UTC
        1 point
        0
        Parent
        Your examples are clearly visible. If your wrong alignment paradigm get reinforced because of your attachment to specific model of causality known to ten people in entire world, you risk to notice this too late.
        Roko 5 Feb 2024 16:37 UTC
        2 points
        0
        Parent
        
        because of your attachment to specific model of causality known to ten people in entire world, you risk to notice this too late.
        
        you’re thinking about this the wrong way. AGI governance will not operate like human governance.
        quetzal_rainbow 5 Feb 2024 16:52 UTC
        1 point
        0
        Parent
        Can you elaborate? I don’t understand where we disagree.