Alignment crazy idea. Only run optimization power through channels that have been optimized to convey it.
Like water that flows through pipes, but doesn’t escape from leaks.
Suppose the AI is connected to a robot body. The AI can optimize along the wires, and through the motors. Optimization power can flow along these channels because humans deliberately optimized them to be good at conveying optimization power. But the AI can’t use row-hammer. Humans didn’t deliberately optimize memory modules to be susceptible. They just happen to be because of physics. Thus the electric interference between memory locations is a channel that optimization power can flow through, but it was not itself optimized to be good at transmitting optimization power. Thus the AI isn’t allowed to use it.
How to measure / detect “is optimized to convey optimization”? What’s “a channel”?
Humans are far from the only optimizers. This would seem to allow e.g. engineering a modification to a virus that kills all humans, because viruses are optimized. Pointing at “is optimized by humans to convey optimization” seems much harder than just “is optimized to convey optimization”.
What’s “running optimization power through a channel”? If I set off a nuclear bomb, am I “running optimization through the air” by passing a shockwave through it? If no, then there’s a lot of stuff we’re ruling out, and if yes, then how can this thing still be pivotal?
I am trying to write something that would make sense if I had as solid and mathy idea of “optimization here” as I do with “information here”.
Viruses are optimizing their own spread, not killing all humans. This seems to be further optimizing an already highly optimized artifact, not flowing optimization through an optimized channel.
I am not sure, I think it depends on why the AI wants the shockwave. Again, all I have is a fuzzy intuition that says yes in some cases, no in others, and shrugs in a lot of cases. I am trying to figure out if I can get this into formal maths. And if I succeed, I will (probably, unless infohazard or something) describe the formal maths.
Viruses are optimizing their own spread, not killing all humans. This seems to be further optimizing an > already highly optimized artifact, not flowing optimization through an optimized channel.
Well I’m saying that the virus’s ability to penetrate the organism, penetrate cells and nuclei, and hijack the DNA transcription machinery, is a channel. It already exists and was optimized to transmit optimization power: selection on the viral genome is optimization, and it passes through this channel, in that this channel allows the viral genome (when outside of another organism) to modify the behavior of an organism’s cells.
(For the record I didn’t downvote your original post and don’t know why anyone would.)
Yeah, probably. However, note that it can only use this channel if a human has deliberately made an optimization channel that connects in to this process. Ie the AI isn’t allowed to invent DNA printers itself.
I think a bigger flaw is where one human decided to make a channel from A to B, another human made a channel from B to C … until in total there is a channel from A to Z that no human wants and no human knows exists, built entirely out of parts that humans build.
Ie person 1 decides the AI should be able to access the internet. Person 2 decided that anyone on the internet should be able to run arbitrary code on their programming website, and the AI puts those together, even when no human did. Is that a failure of this design? Not sure. Can’t get a clear picture until I have actual maths.
Alignment crazy idea. Only run optimization power through channels that have been optimized to convey it.
Like water that flows through pipes, but doesn’t escape from leaks.
Suppose the AI is connected to a robot body. The AI can optimize along the wires, and through the motors. Optimization power can flow along these channels because humans deliberately optimized them to be good at conveying optimization power. But the AI can’t use row-hammer. Humans didn’t deliberately optimize memory modules to be susceptible. They just happen to be because of physics. Thus the electric interference between memory locations is a channel that optimization power can flow through, but it was not itself optimized to be good at transmitting optimization power. Thus the AI isn’t allowed to use it.
How to measure / detect “is optimized to convey optimization”? What’s “a channel”?
Humans are far from the only optimizers. This would seem to allow e.g. engineering a modification to a virus that kills all humans, because viruses are optimized. Pointing at “is optimized by humans to convey optimization” seems much harder than just “is optimized to convey optimization”.
What’s “running optimization power through a channel”? If I set off a nuclear bomb, am I “running optimization through the air” by passing a shockwave through it? If no, then there’s a lot of stuff we’re ruling out, and if yes, then how can this thing still be pivotal?
I am trying to write something that would make sense if I had as solid and mathy idea of “optimization here” as I do with “information here”.
Viruses are optimizing their own spread, not killing all humans. This seems to be further optimizing an already highly optimized artifact, not flowing optimization through an optimized channel.
I am not sure, I think it depends on why the AI wants the shockwave. Again, all I have is a fuzzy intuition that says yes in some cases, no in others, and shrugs in a lot of cases. I am trying to figure out if I can get this into formal maths. And if I succeed, I will (probably, unless infohazard or something) describe the formal maths.
Well I’m saying that the virus’s ability to penetrate the organism, penetrate cells and nuclei, and hijack the DNA transcription machinery, is a channel. It already exists and was optimized to transmit optimization power: selection on the viral genome is optimization, and it passes through this channel, in that this channel allows the viral genome (when outside of another organism) to modify the behavior of an organism’s cells.
(For the record I didn’t downvote your original post and don’t know why anyone would.)
Yeah, probably. However, note that it can only use this channel if a human has deliberately made an optimization channel that connects in to this process. Ie the AI isn’t allowed to invent DNA printers itself.
I think a bigger flaw is where one human decided to make a channel from A to B, another human made a channel from B to C … until in total there is a channel from A to Z that no human wants and no human knows exists, built entirely out of parts that humans build.
Ie person 1 decides the AI should be able to access the internet. Person 2 decided that anyone on the internet should be able to run arbitrary code on their programming website, and the AI puts those together, even when no human did. Is that a failure of this design? Not sure. Can’t get a clear picture until I have actual maths.