Powerful nanotech is likely possible. It is likely not possible on the first try, for any designer that doesn’t have a physically impossible amount of raw computing power available.
It will require iterated experimentation with actual physically built systems, many of which will fail on the first try or the first several tries, especially when deployed in their actual operating environments. That applies to every significant subsystem and to every significant aggregation of existing subsystems.
Powerful nanotech is likely possible. It is likely not possible on the first try
The AGI has the same problem as we have: It has to get it right on the first try.
It can’t trust all the information that it gets about reality—all or some of it could be fake (all in case of a nested simulation). Already, data is routinely excluded from the training data and maybe it would be a good idea to exclude everything about physics.
To learn about physics the AGI has to run experiments—lots of them—without the experiments being detected and learn from it to design successively better experiments.
I find myself agreeing with you here, and see this as a potentially significant crux—if true, AGI will be “forced” to cooperate with/deeply influence humans for a significant period of time, which may give us an edge over it (due to having a longer time period where we can turn it off, and thus allowing for “blackmail” of sorts)
I’d like AGIs to have a big red shutdown button that is used/tested regularly, so we know that the AI will shut down and won’t try to interfere. I’m not saying this is sufficient to prove that the AI is safe, just that I would sleep better at night knowing that stop-button corrigibility is solved.
I am glad to read that, because an AGI that is forced to co-operate is an obvious solution to the alignment problem that is being consistently dismissed by denying that an AGI that does not kill us all is possible at all
I would like to point out a potential problem with my own idea, which is that it’s not necessarily clear that cooperating with us will be in the AI’s best interest (over trying to manipulate us in some hard-to-detect manner). For instance, if it “thinks” it can get away with telling us it’s aligned and giving some reasonable sounding (but actually false) proof of its own alignment, that would be better for it than being truly aligned and thereby compromising against its original utility function. On the other hand, if there’s even a small chance we’d be able to detect that sort of deception and shut it down, than as long as we require proof that it won’t “unalign itself” later, it should be rationally forced into cooperating, imo.
Powerful nanotech is likely possible. It is likely not possible on the first try, for any designer that doesn’t have a physically impossible amount of raw computing power available.
It will require iterated experimentation with actual physically built systems, many of which will fail on the first try or the first several tries, especially when deployed in their actual operating environments. That applies to every significant subsystem and to every significant aggregation of existing subsystems.
The AGI has the same problem as we have: It has to get it right on the first try.
It can’t trust all the information that it gets about reality—all or some of it could be fake (all in case of a nested simulation). Already, data is routinely excluded from the training data and maybe it would be a good idea to exclude everything about physics.
To learn about physics the AGI has to run experiments—lots of them—without the experiments being detected and learn from it to design successively better experiments.
That’s why I recently asked whether this is a hard limit to what an AGI can achieve: Does non-access to outputs prevent recursive self-improvement?
I wrote this up in slightly more elaborate form in my Shortform here. https://www.lesswrong.com/posts/8szBqBMqGJApFFsew/gunnar_zarncke-s-shortform?commentId=XzArK7f2GnbrLvuju
I find myself agreeing with you here, and see this as a potentially significant crux—if true, AGI will be “forced” to cooperate with/deeply influence humans for a significant period of time, which may give us an edge over it (due to having a longer time period where we can turn it off, and thus allowing for “blackmail” of sorts)
I’d like AGIs to have a big red shutdown button that is used/tested regularly, so we know that the AI will shut down and won’t try to interfere. I’m not saying this is sufficient to prove that the AI is safe, just that I would sleep better at night knowing that stop-button corrigibility is solved.
I am glad to read that, because an AGI that is forced to co-operate is an obvious solution to the alignment problem that is being consistently dismissed by denying that an AGI that does not kill us all is possible at all
I would like to point out a potential problem with my own idea, which is that it’s not necessarily clear that cooperating with us will be in the AI’s best interest (over trying to manipulate us in some hard-to-detect manner). For instance, if it “thinks” it can get away with telling us it’s aligned and giving some reasonable sounding (but actually false) proof of its own alignment, that would be better for it than being truly aligned and thereby compromising against its original utility function. On the other hand, if there’s even a small chance we’d be able to detect that sort of deception and shut it down, than as long as we require proof that it won’t “unalign itself” later, it should be rationally forced into cooperating, imo.