I can write a simple program that modifies its own source code and then modifies it back to its original state, in a trivial loop. That’s acting on its own substrate while provably staying within extremely tight constraints. Does that qualify as a disproof of your hypothesis?
I wouldn’t say it does, any more than a program that can identify whether a very specific class of programs will halt disproves the Halting Theorem. I’m just gesturing in what I think might be the general direction of where a proof may lay; usually recursivity is where such traps hide. Obviously a rigorous proof would need rigorous definitions and all.
“A program that can identify whether a very specific class of programs will halt” does disprove the stronger analog of the Halting Theorem that (I argued above) you’d need in order for it to make alignment impossible.
I can write a simple program that modifies its own source code and then modifies it back to its original state, in a trivial loop. That’s acting on its own substrate while provably staying within extremely tight constraints. Does that qualify as a disproof of your hypothesis?
I wouldn’t say it does, any more than a program that can identify whether a very specific class of programs will halt disproves the Halting Theorem. I’m just gesturing in what I think might be the general direction of where a proof may lay; usually recursivity is where such traps hide. Obviously a rigorous proof would need rigorous definitions and all.
“A program that can identify whether a very specific class of programs will halt” does disprove the stronger analog of the Halting Theorem that (I argued above) you’d need in order for it to make alignment impossible.