A reason I mood affiliate with shard theory so much is that like...
I’ll have some contention with the orthodox ontology for technical AI safety and be struggling to adequately communicate it, and then I’ll later listen to a post/podcast/talk by Quintin Pope/Alex Turner, or someone else trying to distill shard theory and then see the exact same contention I was trying to present expressed more eloquently/with more justification.
One example is that like I had independently concluded that “finding an objective function that was existentially safe when optimised by an arbitrarily powerful optimisation process is probably the wrong way to think about a solution to the alignment problem”.
Shard theory also seems to nicely encapsulates my intuitions that we shouldn’t think about powerful AI systems as optimisation processes with a system wide objective that they are consistently pursuing.
Or just the general intuitions that our theories of intelligent systems should adequately describe the generally intelligent systems we actually have access to and that theories that don’t even aspire to do that are ill motivated.
It is the case that I don’t think I can adequately communicate shard theory to a disbeliever, so on reflection there’s some scepticism that I properly understand it.
A reason I mood affiliate with shard theory so much is that like...
I’ll have some contention with the orthodox ontology for technical AI safety and be struggling to adequately communicate it, and then I’ll later listen to a post/podcast/talk by Quintin Pope/Alex Turner, or someone else trying to distill shard theory and then see the exact same contention I was trying to present expressed more eloquently/with more justification.
One example is that like I had independently concluded that “finding an objective function that was existentially safe when optimised by an arbitrarily powerful optimisation process is probably the wrong way to think about a solution to the alignment problem”.
And then today I discovered that Alex Turner advances a similar contention in “Inner and outer alignment decompose one hard problem into two extremely hard problems”.
Shard theory also seems to nicely encapsulates my intuitions that we shouldn’t think about powerful AI systems as optimisation processes with a system wide objective that they are consistently pursuing.
Or just the general intuitions that our theories of intelligent systems should adequately describe the generally intelligent systems we actually have access to and that theories that don’t even aspire to do that are ill motivated.
It is the case that I don’t think I can adequately communicate shard theory to a disbeliever, so on reflection there’s some scepticism that I properly understand it.
That said, the vibes are right.
My main critique of shard theory is that I expect one of the shards to end up dominating the others as the most likely outcome.
Even though that doesn’t happen in biological intelligences?