Take home message: RL has become an irrelevance in explanations of human cognition.
I believe you are overselling your point here.
You are right that behaviorists were traditionally guilty of fake reductionism; they used “reinforcement” as their applause light and bottom line for everything. The first ones were probably “tabula rasa” partial evolution denialists; the later ones gradually accepted that genes might have some minor influence on the mind, usually limited to making an organism more sensitive to certain types of reinforcement learning at certain age. The first ones were also greedy reductionists; for them the causal chain “reinforcement → thoughts → behavior” was already too long and unscientific; it had to be “reinforcement → behavior” directly, because it is known that thoughts are not real, they are not measurable, so scientists are not allowed to talk about them. The later ones gradually fixed this by introducing “black boxes” in their flowcharts, which was a politically correct way for Vulcans to talk about thoughts and emotions and other disgusting human stuff.
But I believe that just like behaviorists got into a silly extreme by trying to avoid the silliness of Freudism, you are also trying to reverse the stupidity of behaviorists, replacing “human cognition is RL” by “RL is irrelevant to human cognition”.
I would buy your argument, if you would be talking about cognition in general. I find it credible that there may be some kind of intelligence that doesn’t use RL at all. (Well, this is kinda an argument by my ignorance, but from the inside it seems valid.) I just don’t believe that human intelligence happens to be the one.
More than anything, the grip of religion is sustained by people just-not-thinking-about the real weak points of their religion. I don’t think this is a matter of training, but a matter of instinct. People don’t think about the real weak points of their beliefs for the same reason they don’t touch an oven’s red-hot burners; it’s painful.
One may be lectured on positive bias for days, and yet overlook it in-the-moment. Positive bias is not something we do as a matter of logic, or even as a matter of emotional attachment. (...) the mistake is sub-verbal, on the level of imagery, of instinctive reactions. (...) Which example automatically pops into your head? You have to learn, wordlessly, to zag instead of zig. You have to learn to flinch toward the zero, instead of away from it. (...) So much of a rationalist’s skill is below the level of words.
The human brain is in some sense composed of layers: the “human layer”, upon the “mammalian layer”, upon the “lizard layer”. The intentional strategic thinking happens on the human layer, while the behaviorists are trying to explain everything at the lower levels. This is why behaviorists fail to understand (or refuse to believe in) the specifically-human stuff.
However, the layers are not clearly separated. The “lizard and mammalian layers” can, and regularly do, override the “human layer” functionality. You may be trying to devise a smart strategy for achieving your goals, but the lower layers will at random moment start interrupting by “hey, that looks scary!” or “that feels low-status!”, and the whole branches of the decision tree get abandoned, often without even noticing that this happened; it just feels like the whole branch wasn’t even there. It’s hard to implement rationality on broken hardware.
Okay; this example only proves that RL can have harmful impact on human cognition. But that already makes it relevant. If nothing else, strategically using RL to override the existing harmful RL could benefit human cognition.
(I also believe that RL has a positive role in human cognition, if merely by pruning the potentially infinite decision trees, but I am leaving this to experts.)
EDIT:
For applied rationality purposes, even if the role of RL in making good strategic decisions would be negligible, good strategies for your life usually include making RL work in your favor. For example, a decision to exercise regularly is not one you make by RL (you don’t randomly try different schedules and notice that when you exercise you feel better). But if you make a decision to exercise, having some reinforcements in the process increases the probability that you will follow through the plan. But this is already irrelevant to the topic of the article.
(Preamble: let’s keep the “Sequences” out of this. I am not religious, and citing the “Sequences” is an appeal to religious authority—a religious authority that I do not accept.)
I am trying to read beyond the letter of your reply, to get to the spirit of it, and I think I see an important disconnect between the point I was making, and the one you are making in return. What you said is valid, but it doesn’t bear on my point.
You see, nobody (least of all me) is denying the existence of reinforcements/rewards/feedback which are noticed by the intelligent system and used for guiding behavior. In general, there will be huge amounts of that. But all that means is that the system is “somewhat sensitive to rewards”.
But that has no bearing on the issue I was addressing, because when people talk about an AI being controlled by RL, they are claiming something far more specific than that the system is “somewhat sensitive to rewards”. They are claiming the existence of an actual RL mechanism, very close to the crude sort that I discussed in the essay. If they were NOT making this strong claim, they would be unable to make the claims they do about the behavior of those AGI systems.
For example, when Holden Karnofsky talks about RL systems going off the rails, his scenario is meaningless unless what he is talking about is a crude RL system. No strong arguments about AGI danger can be inferred, if all he means is that these systems will be “somewhat sensitive to rewards”, and no work could be done if the researchers trying to reassure Karnofsky go out there and write down mathematical or logical analyses of such systems.
It is only when “RL” is taken from the simple applications used today (applications that follow the mechanism described in the AI textbooks), then extrapolated WHOLESALE to the global control level of an AI, that any statements can be made. And, if you look at all the places where “RL” is mentioned in reference to AI safety, it is always clear from the context that this is what they mean. They do not mean that the AI just has a few places in it where it takes some account of rewards.
I agree with you that trying to build the whole AI on a behaviorist-style RL would most likely not work. But see below, Kaj Sotala says that “RL” means something else in the machine learning context. As I am not familiar with machine learning, I can’t say anything meanngful about this, other than that I’d like to see your answer to Kaj’s comment.
The site is basically founded on the sequences. If you reject them, then why bother with LW (which is your choice anyway, but referring to them should be expected), and if you don’t reject them, then why complain about them being brought up?
I don’t think this is a good way of looking at things.
The Sequences are an important part of LW history. I would guess that most LW regulars mostly agree with most of the prominent ideas in them. (As do plenty of people who aren’t LWers at all.) They say many sensible and useful things. But they aren’t the sort of thing it makes sense to “accept” or “reject” wholesale. That really is, as Richard_Loosemore says, the way religions tend to think about their traditions; it leads their adherents into bad thinking, and doing the same here would do the same to us.
Now, in this particular case, I don’t think there was anything very religion-y about Viliam’s quotations from the Sequences. (Despite his use of the word “mantra” :-).) He found something that made the point he wanted and quoted it, much as one might quote from a textbook or a familiar work of literature. So I don’t think Richard’s “let’s keep the Sequences out of it: I’m not religious” response was warranted—but I think that response is better understood as an expression of the longstanding Loosemore-Yudkowsky hostility than as a serious assessment of the merits of the Sequences or any specific idea found in them.
Be that as it may, the appropriate reaction is more like “fair enough” or “Viliam wasn’t actually using the Sequences the way you suggest, but never mind” than “if you Reject The Sequences then you should keep away from here”.
(Actually, I would say more or less the same even if LW were a religious-style community where membership is predicated on Accepting The Sequences. A community should be open to criticism from the outside.)
I would no more “point out specific things that aren’t accurate about the sequence posts” than I would waste my time analyzing the seedier parts of the Advice section of the local bookstore.
If this site is “founded on the Sequences” then it is a religious group or personality cult, because the defining feature of the latter sort of entity is that it centers around a sacred text.
Members of Lesswrong vehemently deny that it is a religious group or personality cult.
Am I to take it that you think it really is?
Or to ask a more direct question: are you declaring that membership of this community should be allowed only for people who swear allegiance to The Sequences? That others should be ejected or vilified?
(I also need to point out that I am pretty sure the Sequences decry Appeals To Authority. What are the constant references to the Sequences, except Appeals To Authority? I have always been a little unclear on that point.)
I also need to point out that I am pretty sure the Sequences decry Appeals To Authority. What are the constant references to the Sequences, except Appeals To Authority? I have always been a little unclear on that point.
The main reason those happen is to establish a shared language and background concepts. If I want to defuse an argument about whether vanilla or chocolate is the best flavor for ice cream, then I can link to 2-Place and 1-Place Words, not in order to swing the day for my chosen side, but in order to either get people to drop the argument as mistaken or argue about the real issue (perhaps whether we should stock the fridge with one or the other). This is way cleaner than trying to recreate from scratch the argument for seeing adjectives as describing observer-observed relationships, rather than being attributes of the observed.
The function of Viliam’s quotes seems to have been to provide examples of human thinking behaving in an RL-like fashion, drawn from the sequences rather than other places mostly because of availability.
Throwing around the “religion” label seems to be committing the non-centrist fallacy . . . . .
The answer to your question depends on what exactly it is that you’re asking. Do I believe most of the sequence posts are correct? Yes. Do I believe it is useful to treat them as standards? Yeah. Do I think you aren’t allowed to criticize them? No, by all means, if you have issues with their content, we can discuss that (I have criticized them once). But I think you should point out specific things that aren’t accurate about the sequence posts, rather than rejecting them for the sake of it.
For me, “reading the Sequences” is like “reading the FAQ”, except that instead of “frequently asked questions” it is more like “questions that should have been asked, but most people believe they already have an answer, which usually involves some kind of confusing the map with the territory”.
Or, to use an educational metaphor, it’s like asking people to study Algebra 101 before they start attending Algebra 201 classes… because if they ignore that, you already know people are going to end up confused and then most of the time will be spent repeating the Algebra 101 knowledge and not getting further.
The idea was that after learning some elementary stuff, we (the LessWrong community) would be able to move forward and discuss more advanced topics. Well, it was a nice idea...
I believe you are overselling your point here.
You are right that behaviorists were traditionally guilty of fake reductionism; they used “reinforcement” as their applause light and bottom line for everything. The first ones were probably “tabula rasa” partial evolution denialists; the later ones gradually accepted that genes might have some minor influence on the mind, usually limited to making an organism more sensitive to certain types of reinforcement learning at certain age. The first ones were also greedy reductionists; for them the causal chain “reinforcement → thoughts → behavior” was already too long and unscientific; it had to be “reinforcement → behavior” directly, because it is known that thoughts are not real, they are not measurable, so scientists are not allowed to talk about them. The later ones gradually fixed this by introducing “black boxes” in their flowcharts, which was a politically correct way for Vulcans to talk about thoughts and emotions and other disgusting human stuff.
But I believe that just like behaviorists got into a silly extreme by trying to avoid the silliness of Freudism, you are also trying to reverse the stupidity of behaviorists, replacing “human cognition is RL” by “RL is irrelevant to human cognition”.
I would buy your argument, if you would be talking about cognition in general. I find it credible that there may be some kind of intelligence that doesn’t use RL at all. (Well, this is kinda an argument by my ignorance, but from the inside it seems valid.) I just don’t believe that human intelligence happens to be the one.
Following my “read the Sequences” mantra, consider this quote from “Avoiding Your Belief’s Real Weak Points”:
Also, a quote from “Positive Bias: Look Into the Dark”:
The human brain is in some sense composed of layers: the “human layer”, upon the “mammalian layer”, upon the “lizard layer”. The intentional strategic thinking happens on the human layer, while the behaviorists are trying to explain everything at the lower levels. This is why behaviorists fail to understand (or refuse to believe in) the specifically-human stuff.
However, the layers are not clearly separated. The “lizard and mammalian layers” can, and regularly do, override the “human layer” functionality. You may be trying to devise a smart strategy for achieving your goals, but the lower layers will at random moment start interrupting by “hey, that looks scary!” or “that feels low-status!”, and the whole branches of the decision tree get abandoned, often without even noticing that this happened; it just feels like the whole branch wasn’t even there. It’s hard to implement rationality on broken hardware.
Okay; this example only proves that RL can have harmful impact on human cognition. But that already makes it relevant. If nothing else, strategically using RL to override the existing harmful RL could benefit human cognition.
(I also believe that RL has a positive role in human cognition, if merely by pruning the potentially infinite decision trees, but I am leaving this to experts.)
EDIT:
For applied rationality purposes, even if the role of RL in making good strategic decisions would be negligible, good strategies for your life usually include making RL work in your favor. For example, a decision to exercise regularly is not one you make by RL (you don’t randomly try different schedules and notice that when you exercise you feel better). But if you make a decision to exercise, having some reinforcements in the process increases the probability that you will follow through the plan. But this is already irrelevant to the topic of the article.
Viliam,
Thanks for your response.
(Preamble: let’s keep the “Sequences” out of this. I am not religious, and citing the “Sequences” is an appeal to religious authority—a religious authority that I do not accept.)
I am trying to read beyond the letter of your reply, to get to the spirit of it, and I think I see an important disconnect between the point I was making, and the one you are making in return. What you said is valid, but it doesn’t bear on my point.
You see, nobody (least of all me) is denying the existence of reinforcements/rewards/feedback which are noticed by the intelligent system and used for guiding behavior. In general, there will be huge amounts of that. But all that means is that the system is “somewhat sensitive to rewards”.
But that has no bearing on the issue I was addressing, because when people talk about an AI being controlled by RL, they are claiming something far more specific than that the system is “somewhat sensitive to rewards”. They are claiming the existence of an actual RL mechanism, very close to the crude sort that I discussed in the essay. If they were NOT making this strong claim, they would be unable to make the claims they do about the behavior of those AGI systems.
For example, when Holden Karnofsky talks about RL systems going off the rails, his scenario is meaningless unless what he is talking about is a crude RL system. No strong arguments about AGI danger can be inferred, if all he means is that these systems will be “somewhat sensitive to rewards”, and no work could be done if the researchers trying to reassure Karnofsky go out there and write down mathematical or logical analyses of such systems.
It is only when “RL” is taken from the simple applications used today (applications that follow the mechanism described in the AI textbooks), then extrapolated WHOLESALE to the global control level of an AI, that any statements can be made. And, if you look at all the places where “RL” is mentioned in reference to AI safety, it is always clear from the context that this is what they mean. They do not mean that the AI just has a few places in it where it takes some account of rewards.
I agree with you that trying to build the whole AI on a behaviorist-style RL would most likely not work. But see below, Kaj Sotala says that “RL” means something else in the machine learning context. As I am not familiar with machine learning, I can’t say anything meanngful about this, other than that I’d like to see your answer to Kaj’s comment.
See my reply to Kaj. His point was not valid because I already covered that in detail in the original essay.
The site is basically founded on the sequences. If you reject them, then why bother with LW (which is your choice anyway, but referring to them should be expected), and if you don’t reject them, then why complain about them being brought up?
I don’t think this is a good way of looking at things.
The Sequences are an important part of LW history. I would guess that most LW regulars mostly agree with most of the prominent ideas in them. (As do plenty of people who aren’t LWers at all.) They say many sensible and useful things. But they aren’t the sort of thing it makes sense to “accept” or “reject” wholesale. That really is, as Richard_Loosemore says, the way religions tend to think about their traditions; it leads their adherents into bad thinking, and doing the same here would do the same to us.
Now, in this particular case, I don’t think there was anything very religion-y about Viliam’s quotations from the Sequences. (Despite his use of the word “mantra” :-).) He found something that made the point he wanted and quoted it, much as one might quote from a textbook or a familiar work of literature. So I don’t think Richard’s “let’s keep the Sequences out of it: I’m not religious” response was warranted—but I think that response is better understood as an expression of the longstanding Loosemore-Yudkowsky hostility than as a serious assessment of the merits of the Sequences or any specific idea found in them.
Be that as it may, the appropriate reaction is more like “fair enough” or “Viliam wasn’t actually using the Sequences the way you suggest, but never mind” than “if you Reject The Sequences then you should keep away from here”.
(Actually, I would say more or less the same even if LW were a religious-style community where membership is predicated on Accepting The Sequences. A community should be open to criticism from the outside.)
You walked right into that. Why not just say that those particular postings explain relevant points?
I would no more “point out specific things that aren’t accurate about the sequence posts” than I would waste my time analyzing the seedier parts of the Advice section of the local bookstore.
If this site is “founded on the Sequences” then it is a religious group or personality cult, because the defining feature of the latter sort of entity is that it centers around a sacred text.
Members of Lesswrong vehemently deny that it is a religious group or personality cult.
Am I to take it that you think it really is?
Or to ask a more direct question: are you declaring that membership of this community should be allowed only for people who swear allegiance to The Sequences? That others should be ejected or vilified?
(I also need to point out that I am pretty sure the Sequences decry Appeals To Authority. What are the constant references to the Sequences, except Appeals To Authority? I have always been a little unclear on that point.)
The main reason those happen is to establish a shared language and background concepts. If I want to defuse an argument about whether vanilla or chocolate is the best flavor for ice cream, then I can link to 2-Place and 1-Place Words, not in order to swing the day for my chosen side, but in order to either get people to drop the argument as mistaken or argue about the real issue (perhaps whether we should stock the fridge with one or the other). This is way cleaner than trying to recreate from scratch the argument for seeing adjectives as describing observer-observed relationships, rather than being attributes of the observed.
The function of Viliam’s quotes seems to have been to provide examples of human thinking behaving in an RL-like fashion, drawn from the sequences rather than other places mostly because of availability.
Throwing around the “religion” label seems to be committing the non-centrist fallacy . . . . .
The answer to your question depends on what exactly it is that you’re asking. Do I believe most of the sequence posts are correct? Yes. Do I believe it is useful to treat them as standards? Yeah. Do I think you aren’t allowed to criticize them? No, by all means, if you have issues with their content, we can discuss that (I have criticized them once). But I think you should point out specific things that aren’t accurate about the sequence posts, rather than rejecting them for the sake of it.
For me, “reading the Sequences” is like “reading the FAQ”, except that instead of “frequently asked questions” it is more like “questions that should have been asked, but most people believe they already have an answer, which usually involves some kind of confusing the map with the territory”.
Or, to use an educational metaphor, it’s like asking people to study Algebra 101 before they start attending Algebra 201 classes… because if they ignore that, you already know people are going to end up confused and then most of the time will be spent repeating the Algebra 101 knowledge and not getting further.
The idea was that after learning some elementary stuff, we (the LessWrong community) would be able to move forward and discuss more advanced topics. Well, it was a nice idea...