The strict definition of action-determined problem is something like this:
agent comes into existence, out of nowhere, in a way the is completely uncaused within the universe and could not have been predicted by its contents
agent is presented with list of options
agent chooses one option
agent disappears
I think the last part may not be strictly necessary, but I’m unsure.
We seem to be agreed that it is possible to define mathematical situations in which self-modification has a well-defined meaning, and that it doesn’t have a well-defined meaning for an AI that exists in the real world and is planning actions in the real world. We don’t know how to generalize those mathematical situations so they are more relevant to the real world.
We differ in that I don’t want to generalize those mathematical situations to work with the real world. I’d rather discard them. You’d rather try to find a use for them.
I suppose clarifying all that is a useful outcome for the conversation.
Outside of ‘electron’, ‘quark’ , ‘neutrino’ almost none of the words we use are well-defined on the real world. All non-fundamental concepts break if you push them hard enough.
I think they are useful in that I have a pretty good idea of what I mean by ‘self-modification’ in the real world. For a simpler example, if I want to build a paperclipping AI, the sort of thing I’m looking to avoid is where for some reason my paperclipping AI starts making something pointless and stupid, like staples. I wish to study self-modification, because I want to stop it from modifying itself into a staple-maker. I may not know exactly what counts as self-modification, but the correct response is not to ignore it and say ‘oh, I’m sure it will all work out fine either way’.
Yes, making it rigorous will be difficult. Yudkowsky himself has said he thinks that 95% of the work will be in figuring out which theorem to prove. The correct response to a difficult problem is not to run away.
Yes, making it rigorous will be difficult. Yudkowsky himself has said he thinks that 95% of the work will be in figuring out which theorem to prove. The correct response to a difficult problem is not to run away.
I’m not suggesting running away. I’m suggesting that the rigorous statement of the theorem will not include the notions of self-modification (my definition) or self-modification (your definition), since we don’t have rigorous definitions of those terms that apply outside of a counterfactual mathematical formalism.
We seem to be agreed that it is possible to define mathematical situations in which self-modification has a well-defined meaning, and that it doesn’t have a well-defined meaning for an AI that exists in the real world and is planning actions in the real world. We don’t know how to generalize those mathematical situations so they are more relevant to the real world.
We differ in that I don’t want to generalize those mathematical situations to work with the real world. I’d rather discard them. You’d rather try to find a use for them.
I suppose clarifying all that is a useful outcome for the conversation.
Outside of ‘electron’, ‘quark’ , ‘neutrino’ almost none of the words we use are well-defined on the real world. All non-fundamental concepts break if you push them hard enough.
I think they are useful in that I have a pretty good idea of what I mean by ‘self-modification’ in the real world. For a simpler example, if I want to build a paperclipping AI, the sort of thing I’m looking to avoid is where for some reason my paperclipping AI starts making something pointless and stupid, like staples. I wish to study self-modification, because I want to stop it from modifying itself into a staple-maker. I may not know exactly what counts as self-modification, but the correct response is not to ignore it and say ‘oh, I’m sure it will all work out fine either way’.
Yes, making it rigorous will be difficult. Yudkowsky himself has said he thinks that 95% of the work will be in figuring out which theorem to prove. The correct response to a difficult problem is not to run away.
I’m not suggesting running away. I’m suggesting that the rigorous statement of the theorem will not include the notions of self-modification (my definition) or self-modification (your definition), since we don’t have rigorous definitions of those terms that apply outside of a counterfactual mathematical formalism.