Bo Chin comments on Models Don’t “Get Reward”

Bo Chin 19 Jan 2023 0:13 UTC
−1 points
−2
I think a more sinister problem with ML and mostly alignment is linguistic abstraction. This post is a good example where the author is treating reinforcement learning like how we would understand the words “reinforcement learning” in English layman terms. It has to do with 1 reinforcement (rewards) 2 machine learning. You are taking the name of a ML algorithm too literally. Let me show you:
However, if at test-time you move the coin so it is now on the left-hand side of the level, the agent will not navigate to the coin, but instead continue navigating to the right-hand side of the level.
This is just over-fitting.
if two policies get equally good reward, but one is “more risky” in that a slightly less competent version of the policy gets extremely poor reward, then that one’s less likely to be selected for.
This is just over-fitting too.
The same thing happens with relating neural networks to actual neuroscience. It started out with neuroscience inspiring ML, but now because ML with NN is so successful, it’s inspiring neuroscience as well. It seems like we are stuck in established models mentally. Like LeCun’s recent paper on AGI is based on human cognition too. We are so obsessed with the word “intelligence” these days, it feels more like a constraint than inspiring perspective on what you may generalize AI and ML as statistically computation. I think alignment problem mostly has to do with how we are using ML system (i.e. what domain we are using these systems in), rather than the systems themselves. Whether it’s inspired by the human brain or something else, at the end of the day, it’s just doing statistical computations. It’s really what you do with the computed results that has further implications that alignment is mostly concerned about.
A model without a prior is the uniform distribution. It is the least over-fitted model that you can possibly have. Then you go through the learning process and over-fit and under-fit multiple times to get a more accurate model. It will never be perfect because the data will never be perfect. If your training data is on papers before 2010, then you might be over-fitting if you are using the same model to test on papers after 2010.