Yeah, I totally agree. My motivation for writing the first section was that people use the word ‘deception’ to refer to both things, and then make what seem like incorrect inferences. For example, current ML systems do the ‘Goodhart deception’ thing, but then I’ve heard people use this to imply that it might be doing ‘consequentialist deception’.
These two things seem close to unrelated, except for the fact that ‘Goodhart deception’ shows us that AI systems are capable of ‘tricking’ humans.
Yeah, I totally agree. My motivation for writing the first section was that people use the word ‘deception’ to refer to both things, and then make what seem like incorrect inferences. For example, current ML systems do the ‘Goodhart deception’ thing, but then I’ve heard people use this to imply that it might be doing ‘consequentialist deception’.
These two things seem close to unrelated, except for the fact that ‘Goodhart deception’ shows us that AI systems are capable of ‘tricking’ humans.
Okay I see, yep that makes sense to me (-: