Said Achmiz comments on Hammertime Day 5: Comfort Zone Expansion

Said Achmiz 3 Feb 2018 22:01 UTC
1 point

I think part of what’s going on is that when I say “reinforcement learning” it connects to a lot of gears in my head (for example, Q-learning) and I think I’ve been typical minding on how much other people know about those gears.

Indeed; e.g., I didn’t know what Q-learning is until now—in fact I still don’t know what it is (for precisely the reason noted in the banner at the top of the Wikipedia page).

It’s entirely understandable that you are reluctant to take on a project to explain the entirety of what seems to be a large and somewhat abstruse scientific subfield. That said, the takeaway is that most readers here have no good reason to share those of your views that have, as a prerequisite, understanding of said field. (Making our way toward having a good overview of reinforcement learning would seem to be a good community goal for Less Wrong.)

Yes, all of those things could be an example of reinforcement learning, depending on contextual factors … I’m aware that you’ll find this an unsatisfying answer, but since humans do in fact exhibit a large and complicated array of behaviors (I hope we can agree on this at least) a theory into which it all comfortably fits needs to be flexible enough to produce such an array.

Indeed, but the point here is that the same could be said of atomic theory, which, while perfectly true, tells us very little about what behaviors we should engage in, what strategies and approaches to life’s challenges are likely to be successful, etc. If “reinforcement learning” and “reinforcement learner” are that broad and general of categories, then just pointing that humans are reinforcement learners, in support of a specific claim or specific advice or a specific technique, etc., is not particularly convincing.
- habryka 3 Feb 2018 22:53 UTC
  9 points
  Parent
  Maybe this is Bay Area bias, but the models that Qiaochu is relying on strike me as very natural, point to a lot of meaningful gears in my head, and my model of at least a large chunk of the people on this side have a similar experience.
  I feel that reinforcement-learning based models were covered by a bunch of highly upvoted content on this site, let me quickly take 5 minutes to find the references I remember:
  https://www.lesserwrong.com/posts/zThWT5Zvifo5qYaca/the-neuroscience-of-pleasure
  https://www.lesserwrong.com/posts/EMJ3egz48BtZS8Pws/basics-of-animal-reinforcement
  https://www.lesserwrong.com/posts/hN2aRnu798yas5b2k/a-crash-course-in-the-neuroscience-of-human-motivation
  And a bunch more. This perspective of humans as reinforcement learners has been a core topic of a lot of LessWrong writing, and it seems reasonable for people to write things that build on top of that.
  - Said Achmiz 3 Feb 2018 22:59 UTC
    4 points
    Parent
    Thank you very much for the links!
    
    As for the models—to me they seem oddly esoteric and specific to support such general claims (and, of course, I don’t have these “gears in my head” to point to).
    
    However, perhaps I’ll change my mind after reading the posts you linked—which I will do at my earliest convenience!