Itangalo comments on Cooperative Inverse Reinforcement Learning vs. Irrational Human Preferences

Itangalo 10 Jul 2023 15:34 UTC
1 point
I’ve been thinking about limitations and problems with CIRL. Thanks for this post!
I haven’t done the math, but I’d like to explore a scenario where the AI learns from kids and might infer that eating sweets and playing video games is better than eating a proper meal and doing your homework (or whatever). This could of course be mitigated by learning preferences from parents, which could have a stronger impact on how the AI picks up preferences. But there is a strong parallel to how humanity treats this planet of ours. Wouldn’t an AI infer that we actually want to raise the global temperature, make a lot of species extinct, and generally be fairly short-sighted?