(nods) For my own part, it’s frequently worse than random… when I don’t attend to what I’m doing, I frequently berate or otherwise punish myself for attempts to achieve a target that fall short of that target, and I’m more likely to do that the more I value achieving the target. Which is a great way to extinguish the behaviors I value.
I suspect it’s very difficult to design the right reinforcement strategy. It’s easy to reward something that seems related to the goal, but can gradually become a replacement for the goal.
For example rewarding success and punishing failure reinforces choosing only trivial tasks, which prevents learning new things. Rewarding starting new things reinforces starting new tasks without finishing them, also choosing tasks for being new, not being useful. Etc.
Rational thinking about consequences, and changing the strategy when necessary, cannot be avoided. So perhaps this should be reinforced. But how do we distinguish between genuine rationality and signalling? Yeah, rationalists should win, but by rewarding success and punishing failure… see the previous paragraph.
Anyway, many people do worse than random, so some reinforcement can be used to improve the situation.
EDIT: Another problem: I suspect that any reinforcement inevitably goes meta. When I get a reward for doing X, I will do X more, but I will also like the reinforcement mechanism more. When I get punished for doing Y, I will do Y less, but I will also hate the reinforcement mechanism and rationalize why I must get rid of it.
I suspect that people prefer wireheading, except in cases when it becomes too obvious that it is wireheading. If I am allowed to choose my reinforcement mechanisms, I will probably unknowingly slowly optimize them towards wireheading. If someone else chooses my reinforcement mechanisms, I suspect they will choose it to optimize their utility function instead of mine.
Yoiks! This may well be why my procrastination at work has increased and increased over the decades. I almost always (habitually?) feel like my efforts are not good enough, will be criticized negatively.
(nods) For my own part, it’s frequently worse than random… when I don’t attend to what I’m doing, I frequently berate or otherwise punish myself for attempts to achieve a target that fall short of that target, and I’m more likely to do that the more I value achieving the target. Which is a great way to extinguish the behaviors I value.
I suspect it’s very difficult to design the right reinforcement strategy. It’s easy to reward something that seems related to the goal, but can gradually become a replacement for the goal.
For example rewarding success and punishing failure reinforces choosing only trivial tasks, which prevents learning new things. Rewarding starting new things reinforces starting new tasks without finishing them, also choosing tasks for being new, not being useful. Etc.
Rational thinking about consequences, and changing the strategy when necessary, cannot be avoided. So perhaps this should be reinforced. But how do we distinguish between genuine rationality and signalling? Yeah, rationalists should win, but by rewarding success and punishing failure… see the previous paragraph.
Anyway, many people do worse than random, so some reinforcement can be used to improve the situation.
EDIT: Another problem: I suspect that any reinforcement inevitably goes meta. When I get a reward for doing X, I will do X more, but I will also like the reinforcement mechanism more. When I get punished for doing Y, I will do Y less, but I will also hate the reinforcement mechanism and rationalize why I must get rid of it.
I suspect that people prefer wireheading, except in cases when it becomes too obvious that it is wireheading. If I am allowed to choose my reinforcement mechanisms, I will probably unknowingly slowly optimize them towards wireheading. If someone else chooses my reinforcement mechanisms, I suspect they will choose it to optimize their utility function instead of mine.
Yoiks! This may well be why my procrastination at work has increased and increased over the decades. I almost always (habitually?) feel like my efforts are not good enough, will be criticized negatively.
(nods) That’s a pretty common result of relying on punishment to shape behavior.