mruwnik comments on All AGI Safety questions welcome (especially basic ones) [July 2023]

mruwnik 4 Aug 2023 12:40 UTC
2 points
0
Think of reward not as “here’s an ice-cream for being a good boy” and more “you passed my test. I will now do neurosurgery on you to make you more likely to behave the same way in the future”. The result of applying the “reward” in both cases is that you’re more likely to act as desired next time. In humans it’s because you expect to get something nice out of being good, in computers it’s because they’ve been modified to do so. It’s hard to directly change how humans think and behave, so you have to do it via ice-cream and beatings. While with computers you can just modify their memory.