Dakara comments on If we solve alignment, do we die anyway?

Dakara 18 Nov 2024 18:38 UTC
3 points
0
“If you have an agent that’s aligned and smarter than you, you can trust it to work on further alignment schemes. It’s wiser to spot-check it, but the humans’ job becomes making sure the existing AGI is truly aligned, and letting it do the work to align its successor, or keep itself aligned as it learns.”

Ah, that’s the link that I was missing. Now it makes sense. You can use AGI as a reviewer for other AGIs, once it is better than humans at reviewing AGIs. Thank you a lot for clarifying!
- Seth Herd 18 Nov 2024 18:42 UTC
  4 points
  1
  Parent
  My pleasure. Evan Hubinger made this point to me when I’d misunderstood his scalable oversight proposal.
  
  Thanks again for engaging with my work!