Namely, it requires us to describe our utility function at the base level of reality, but that’s difficult because we don’t know how paperclips are represented at the base level of reality! We only know how we perceive paperclips.
In principle you could have a paper-clip perception module which counts paper-clips and define utility in terms of its output, and include huge penalties for world states where the paper-clip perception module has been functionally altered (or, more precisely, for world states where you can’t prove that the paper-clip perception module hasn’t been functionally altered).
Note that a utility function in UDT is supposed to be a mathematical expression in closed form, with no free variables pointing to “perception”. So applying your idea to UDT would require a mathematical model of how agents get their perceptions, e.g. “my perceptions are generated by the universal distribution” like in UDASSA. Such a model would have to address all the usual anthropic questions, like what happens to subjective probabilities if the perception module gets copied conditionally on winning the lottery, etc. And even if we found the right model, I wouldn’t build an AI based on that idea, because it might try to hijack the inputs of the perception module instead of doing useful work.
I’d be really interested in a UDT-like agent with a utility function over perceptions instead of a closed-form mathematical expression, though. Nesov called that hypothetical thing “UDT-AIXI” and we spent some time trying to find a good definition, but unsuccessfully. Do you know how to define such a thing?
I’d be really interested in a UDT-like agent with a utility function over perceptions instead of a closed-form mathematical expression, though. Nesov called that hypothetical thing “UDT-AIXI” and we spent some time trying to find a good definition, but unsuccessfully. Do you know how to define such a thing?
In principle you could have a paper-clip perception module which counts paper-clips and define utility in terms of its output, and include huge penalties for world states where the paper-clip perception module has been functionally altered (or, more precisely, for world states where you can’t prove that the paper-clip perception module hasn’t been functionally altered).
Note that a utility function in UDT is supposed to be a mathematical expression in closed form, with no free variables pointing to “perception”. So applying your idea to UDT would require a mathematical model of how agents get their perceptions, e.g. “my perceptions are generated by the universal distribution” like in UDASSA. Such a model would have to address all the usual anthropic questions, like what happens to subjective probabilities if the perception module gets copied conditionally on winning the lottery, etc. And even if we found the right model, I wouldn’t build an AI based on that idea, because it might try to hijack the inputs of the perception module instead of doing useful work.
I’d be really interested in a UDT-like agent with a utility function over perceptions instead of a closed-form mathematical expression, though. Nesov called that hypothetical thing “UDT-AIXI” and we spent some time trying to find a good definition, but unsuccessfully. Do you know how to define such a thing?
My model of naturalized induction allows it: http://lesswrong.com/lw/jq9/intelligence_metrics_with_naturalized_induction/