DragonGod comments on DragonGod’s Shortform

DragonGod 25 Apr 2023 13:34 UTC
4 points
A reason I mood affiliate with shard theory so much is that like...
I’ll have some contention with the orthodox ontology for technical AI safety and be struggling to adequately communicate it, and then I’ll later listen to a post/podcast/talk by Quintin Pope/Alex Turner, or someone else trying to distill shard theory and then see the exact same contention I was trying to present expressed more eloquently/with more justification.
One example is that like I had independently concluded that “finding an objective function that was existentially safe when optimised by an arbitrarily powerful optimisation process is probably the wrong way to think about a solution to the alignment problem”.
And then today I discovered that Alex Turner advances a similar contention in “Inner and outer alignment decompose one hard problem into two extremely hard problems”.
Shard theory also seems to nicely encapsulates my intuitions that we shouldn’t think about powerful AI systems as optimisation processes with a system wide objective that they are consistently pursuing.
Or just the general intuitions that our theories of intelligent systems should adequately describe the generally intelligent systems we actually have access to and that theories that don’t even aspire to do that are ill motivated.
It is the case that I don’t think I can adequately communicate shard theory to a disbeliever, so on reflection there’s some scepticism that I properly understand it.
That said, the vibes are right.
- Chris_Leong 26 Apr 2023 0:34 UTC
  4 points
  Parent
  My main critique of shard theory is that I expect one of the shards to end up dominating the others as the most likely outcome.
  - Nate Showell 26 Apr 2023 3:05 UTC
    12 points
    Parent
    Even though that doesn’t happen in biological intelligences?