I have one bit of insight for you: how do humans make machines safe right now? Can you name a safety mechanism where high complexity/esoteric math is at the core of safety vs just a simple idea that can’t fail?
Like do we model thermal plasma dynamics or just encase everything in concrete and metal to reduce fire risk?
What is a safer way to prevent a nuke from spontaneously detonating, a software check for a code or a “plug” that you remove, creating hundreds of air gaps across the detonation circuits?
My conclusion is that ai safety, like any safety, has to be by engineering by repeating simple ideas that can’t possibly fail instead of complex ideas that we may need human intelligence augmentation to even develop. (And then we need post failure resiliency, see “core catchers” for nuclear reactors or firebreaks. Assume ASIs will escape, limit where they can infect)
That’s what makes this particular proposal a non starter, it adds additional complexity, essentially a model for many “membranes” at different levels of scope (including the solar system!). Instead of adding a simple element to your ASI design you add an element more complex than the ASI itself.
anyway, this math is probably going to be a bunch of complicated math to output something simple. it’s just that the complex math is a way to check the simple thing. just like, you know, how we do actually model thermal plasma dynamics, in fact. seems like you’re arguing against someone who isn’t here right now, I’ve always been much more on your side about most of this than you seem to expect, I’m basically a cannellian capabilities wise. I just think you’re missing how interesting this boundaries research idea could be if it was fixed up so it was useful to formally check those safety margins you’re talking about.
just think you’re missing how interesting this boundaries research idea could be if it was fixed up so it was useful to formally check those safety margins you’re talking about.
Can you describe the scope of the ai system that would use some form of boundary model to choose what to do?
I have one bit of insight for you: how do humans make machines safe right now? Can you name a safety mechanism where high complexity/esoteric math is at the core of safety vs just a simple idea that can’t fail?
Like do we model thermal plasma dynamics or just encase everything in concrete and metal to reduce fire risk?
What is a safer way to prevent a nuke from spontaneously detonating, a software check for a code or a “plug” that you remove, creating hundreds of air gaps across the detonation circuits?
My conclusion is that ai safety, like any safety, has to be by engineering by repeating simple ideas that can’t possibly fail instead of complex ideas that we may need human intelligence augmentation to even develop. (And then we need post failure resiliency, see “core catchers” for nuclear reactors or firebreaks. Assume ASIs will escape, limit where they can infect)
That’s what makes this particular proposal a non starter, it adds additional complexity, essentially a model for many “membranes” at different levels of scope (including the solar system!). Instead of adding a simple element to your ASI design you add an element more complex than the ASI itself.
Thanks for considering my karma.
rockets
anyway, this math is probably going to be a bunch of complicated math to output something simple. it’s just that the complex math is a way to check the simple thing. just like, you know, how we do actually model thermal plasma dynamics, in fact. seems like you’re arguing against someone who isn’t here right now, I’ve always been much more on your side about most of this than you seem to expect, I’m basically a cannellian capabilities wise. I just think you’re missing how interesting this boundaries research idea could be if it was fixed up so it was useful to formally check those safety margins you’re talking about.
Can you describe the scope of the ai system that would use some form of boundary model to choose what to do?