Oh ok answering this question per the “membrane” language. The simple answer is with isolated AI systems we don’t actually care about the safety of most actions in the sense that the OP is thinking of.
Like an autonomous car exists to maximize revenue for its owners. Running someone over incurs negative revenue from settlements and reputation loss.
So the right way to do the math is to have an estimate, for a given next action, of the cost of all the liabilities plus the gain from future revenue. Then the model simply maximizes revenue.
As long as it’s scope is just a single car at a time and you rigidly limit scope in the sim, this is safe. (As in suppose hypothetically the agent controlling the car could buy coffee shops and the simulator modeled the revenue gain from this action. Then since the goal is to max revenue, something something paperclips)
This generalizes to basically any task you can think of. From an ai tutor to a machine in a warehouse etc. The point is you don’t need a complex definition of morality in the first place, only a legal one, because you only task your AIs with narrow scope tasks. Note that narrow scope can still be enormous, such as “design an IC from scratch”. But you only care about the performance and power consumption of the resulting IC.
The ai model doesn’t need to care about the leaked reagents poisoning residents near the chip fab or all the emissions from the power plants powering billions of this ic design. This is out of scope. This is a task for humans to either worry about as a government, where they may task models to propose and model possible solutions.
The point is you don’t need a complex definition of morality in the first place
okay, then I will stop wasting my time on talking to you, since you explicitly are not interested in developing the math this thread exists to develop. later. I’m strong upvoting your a recent comment so I can strong downvote the original one, thereby hiding it, without penalizing your karma too much. however, I am not impressed with how off-topic you got.
I have one bit of insight for you: how do humans make machines safe right now? Can you name a safety mechanism where high complexity/esoteric math is at the core of safety vs just a simple idea that can’t fail?
Like do we model thermal plasma dynamics or just encase everything in concrete and metal to reduce fire risk?
What is a safer way to prevent a nuke from spontaneously detonating, a software check for a code or a “plug” that you remove, creating hundreds of air gaps across the detonation circuits?
My conclusion is that ai safety, like any safety, has to be by engineering by repeating simple ideas that can’t possibly fail instead of complex ideas that we may need human intelligence augmentation to even develop. (And then we need post failure resiliency, see “core catchers” for nuclear reactors or firebreaks. Assume ASIs will escape, limit where they can infect)
That’s what makes this particular proposal a non starter, it adds additional complexity, essentially a model for many “membranes” at different levels of scope (including the solar system!). Instead of adding a simple element to your ASI design you add an element more complex than the ASI itself.
anyway, this math is probably going to be a bunch of complicated math to output something simple. it’s just that the complex math is a way to check the simple thing. just like, you know, how we do actually model thermal plasma dynamics, in fact. seems like you’re arguing against someone who isn’t here right now, I’ve always been much more on your side about most of this than you seem to expect, I’m basically a cannellian capabilities wise. I just think you’re missing how interesting this boundaries research idea could be if it was fixed up so it was useful to formally check those safety margins you’re talking about.
just think you’re missing how interesting this boundaries research idea could be if it was fixed up so it was useful to formally check those safety margins you’re talking about.
Can you describe the scope of the ai system that would use some form of boundary model to choose what to do?
Oh ok answering this question per the “membrane” language. The simple answer is with isolated AI systems we don’t actually care about the safety of most actions in the sense that the OP is thinking of.
Like an autonomous car exists to maximize revenue for its owners. Running someone over incurs negative revenue from settlements and reputation loss.
So the right way to do the math is to have an estimate, for a given next action, of the cost of all the liabilities plus the gain from future revenue. Then the model simply maximizes revenue.
As long as it’s scope is just a single car at a time and you rigidly limit scope in the sim, this is safe. (As in suppose hypothetically the agent controlling the car could buy coffee shops and the simulator modeled the revenue gain from this action. Then since the goal is to max revenue, something something paperclips)
This generalizes to basically any task you can think of. From an ai tutor to a machine in a warehouse etc. The point is you don’t need a complex definition of morality in the first place, only a legal one, because you only task your AIs with narrow scope tasks. Note that narrow scope can still be enormous, such as “design an IC from scratch”. But you only care about the performance and power consumption of the resulting IC.
The ai model doesn’t need to care about the leaked reagents poisoning residents near the chip fab or all the emissions from the power plants powering billions of this ic design. This is out of scope. This is a task for humans to either worry about as a government, where they may task models to propose and model possible solutions.
okay, then I will stop wasting my time on talking to you, since you explicitly are not interested in developing the math this thread exists to develop. later. I’m strong upvoting your a recent comment so I can strong downvote the original one, thereby hiding it, without penalizing your karma too much. however, I am not impressed with how off-topic you got.
I have one bit of insight for you: how do humans make machines safe right now? Can you name a safety mechanism where high complexity/esoteric math is at the core of safety vs just a simple idea that can’t fail?
Like do we model thermal plasma dynamics or just encase everything in concrete and metal to reduce fire risk?
What is a safer way to prevent a nuke from spontaneously detonating, a software check for a code or a “plug” that you remove, creating hundreds of air gaps across the detonation circuits?
My conclusion is that ai safety, like any safety, has to be by engineering by repeating simple ideas that can’t possibly fail instead of complex ideas that we may need human intelligence augmentation to even develop. (And then we need post failure resiliency, see “core catchers” for nuclear reactors or firebreaks. Assume ASIs will escape, limit where they can infect)
That’s what makes this particular proposal a non starter, it adds additional complexity, essentially a model for many “membranes” at different levels of scope (including the solar system!). Instead of adding a simple element to your ASI design you add an element more complex than the ASI itself.
Thanks for considering my karma.
rockets
anyway, this math is probably going to be a bunch of complicated math to output something simple. it’s just that the complex math is a way to check the simple thing. just like, you know, how we do actually model thermal plasma dynamics, in fact. seems like you’re arguing against someone who isn’t here right now, I’ve always been much more on your side about most of this than you seem to expect, I’m basically a cannellian capabilities wise. I just think you’re missing how interesting this boundaries research idea could be if it was fixed up so it was useful to formally check those safety margins you’re talking about.
Can you describe the scope of the ai system that would use some form of boundary model to choose what to do?