Yes this seems like a dangerous idea… if it can’t trust the input it’s getting from humans, the only way to be sure it is ACTUALLY having a reduced impact might be just to get rid of the humans… even if in the short term it has a high impact, it can just kill all the humans then do nothing, ensuring it will never be tricked into having a high impact again.
Yes this seems like a dangerous idea… if it can’t trust the input it’s getting from humans, the only way to be sure it is ACTUALLY having a reduced impact might be just to get rid of the humans… even if in the short term it has a high impact, it can just kill all the humans then do nothing, ensuring it will never be tricked into having a high impact again.
http://lesswrong.com/r/discussion/lw/lyh/utility_vs_probability_idea_synthesis/ and http://lesswrong.com/lw/ltf/false_thermodynamic_miracles/