I rather like this concept, and probably put higher credence on it than you. However, I don’t think we are actually modeling that many layers deep. As far as I can tell, it’s actually rare to model even 1 layer deep. I think your hypothesis is close, but not quite there. We are definitely doing something, but I don’t think it can properly be described as modeling, at least in such fast-paced circumstances. It’s something close to modeling, but not quite it. It’s more like what a machine learning algorithm does, I think, and less like a computer simulation.
Models have moving parts, and diverge rapidly at points of uncertainty, like how others might react. When you build a model, it is a conscious process, and requires intelligent thought. The model takes world states as inputs, and simulates the effects these have on the components of the model. Then, after a bunch of time consuming computation, the model spits out a play-by-play of what we think will happen. If there are any points of uncertainty, the model will spit out multiple possibilities stemming from each, and build up multiple possible branches. This is extremely time consuming, and resource intense.
But there’s a fast, System 1 friendly way to route around needing a time-consuming model: just use a look-up-table.^[1] Maybe run the time consuming model a bunch of times for different inputs, and then mentally jot down the outputs for quick access later, on the fly. Build a big 2xn lookup table, with model inputs in 1 column, and results in the other. Do the same for every model you find useful. Maybe have 1 table for a friend’s preferences: inputting tunafish outputs gratitude (for remembering her preferences). Imputing tickling outputs violence.
Perhaps this is why we obsess over stressful situations, going over all the interpretations and everything we could have done differently. We’re building models of worrying situations, running them, and then storing the results for quick recall later. Maybe some of this is actually going on in dreams and nightmares, too.
But there’s another way to build a lookup table: directly from data, without running any simulation. I think we just naturally keep tabs of all sorts of things without even thinking about it. Arguably, most of our actions are being directed by these mental associations, and not by anything containing conscious models.
Here’s an example of what I think is going on, mentally:
Someone said something that pattern matches as rash? Quick, scan through all the lookup tables within arm’s reach for callous-inputs. One output says joking. Another says accident. A third says he’s being passive aggressive. Joking seems to pattern match the situation the best.
But oh look, you also ran it through some of the lookup tables for social simulations, and one came up with a flashing red light saying Gary’s mom doesn’t realize it was a joke.
That’s awkward. You don’t have any TAPs (Trigger Action Plans) installed for what to do in situations that pattern match to an authority figure misunderstanding a rude joke as serious. Your mind spirals out to less and less applicable TAP lookup tables, and the closest match is a trigger called “friend being an ass”. You know he’s actually joking, but this is the closest match, so you look at the action column, and it says to reprimand him, so you do.
Note that no actual modeling has occurred, and that all lookup tables used could have been generated purely experimentally, without ever consciously simulating anyone. This would explain why it’s so hard to explain the parts of our model when asked: we have no model, just heuristics, and fuzzy gut feeling about the situation. Running the model again would fill in some of the details we’ve forgotten, but takes a while to run, and slows down the conversation. That level of introspection is fine in an intimate, introspective conversation, but if it’s moving faster, the conversation will have changed topics by the time you’ve clarified your thoughts into a coherent model.
Most of the time though, I don’t think we even explicitly think about the moving parts that would be necessary to build a model. Take lying, for example:
We rarely think “A wants B to think X about C, because A models B as modeling C in a way that A doesn’t like, and A realizes that X is false but would cause B to act in a way that would benefit A if B believed it.” (I’m not even sure that will parse correctly for anyone who reads it. That’s kind of my point though.)
Instead, we just think “A told lie X to B about C”. Or even just “A lied”, leaving out all the specific details unless they become necessary. All the complexity of precisely what a lie is gets tucked away neatly inside the handle “lie”, so we don’t have to think about it or consciously model it. We just have to pattern match something to it, and then we can apply the label.
If pressed, we’ll look up what “lied” means, and say that “A said X was true, but X is actually false”. If someone questions whether A might actually believe X, we’ll improve out model of lying further, to include the requirement that A not actually believe X. We’ll enact a TAP to search for evidence that A thinks X, and come up with memories Y and Z, which we will recount verbally. If someone suspects that you are biased against A, or just exhibiting confirmation bias, they may say so. This just trips a defensive TAP, which triggers a “find evidence of innocence” action. So, your brain kicks into high gear and automatically goes and searches all your lookup tables for things which pattern match as evidence in your favor.
We appear to be able to package extremely complex models up into a single function, so it seems unlikely that we are doing anything different with simpler models of things like lying. There’s no real difference in how complex the concept of god feels from the concept of a single atom or something, even though one has much more moving parts under the hood of the model. We’re not using any of the moving parts of the model, just spitting out cashed thoughts from a lookup table, so we don’t notice the difference.
If true, this has a bunch of other interesting implications:
This is likely also why people usually act first and pick a reason for that choice second: we don’t have a coherent model of the results until afterward anyway, so it’s impossible to act like an agent in real time. We can only do what we are already in the habit of doing, by following cashed TAPs. This is the reason behind akrasia, and the “elephant and rider” (System 1 and System 2) relationship.
Also note that this scales much better: you don’t need to know any causal mechanisms to build a lookup table, so you can think generally about how arbitrarily large groups will act based only on past experience, without needing to build it up from simulating huge numbers of individuals.
It implies that we are just Chinese Rooms most of the time, since conscious modeling is not involved most of the time. Another way of thinking of it is that we store keep the answers to the sorts of common computations we expect to do in (working?) memory, so that the more computationally intense consciousness can concentrate on the novel or difficult parts. Perhaps we could even expand our consciousness digitally to always recompute responses every time.
[1] For the record, I don’t think our minds have neat, orderly lookup tables. I think they use messy, associative reasoning, like the Rubes and Bleggs in How An Algorithm Feels From The Inside. This is what I’m referring to when I mention pattern matching, and each time I talk about looking something up in a empirically derived lookup table, a simulation input/results lookup table, or a TAP lookup table.
I think these sorts of central nodes with properties attached make up a vast, web-like networks, built like network 2 in the link. All the properties are themselves somewhat fuzzy, just like the central “rube”/”blegg” node. We could de-construct “cube” into constituent components the same way: 6 sides, all are flat, sharp corners, sharp edges, sides roughly 90 degrees apart, etc. You run into the same mental problems with things like rhombohedrons, and are forced to improve your sloppy default mental conception of cubes somehow if you want to avoid ambiguity.
All nodes are defined only by it’s relation to adjacent nodes, just like the central rube/blegg node. There are no labels attached to the nodes, just node clusters for words and sounds and letters attached to the thing they are meant to represent. It would be a graph theory monster if we tried to map it all out, but in principle you could do it by asking someone how strongly they associated various words and concepts.
I enthusiastically agree with you. I actually do machine learning as my day job, and its ability to store “lookup table” style mappings with generalization was exactly what I was thinking of when referring to “modeling”. I’m pleased I pointed to the right concept, and somewhat disappointed that my writing wasn’t high enough quality to clarify this from the beginning. what you mention about obsessing seems extremely true to me, and seems related to Satvik’s internationalization of it as “rapid fire simulations”.
in general I think of s1 as “fast lookup-table style reasoning” and s2 as “cpu-and-program style reasoning”. my goal here was to say:
humans have a hell of a lot of modeling power in the fast lookup style of reasoning
that style of reasoning can embed recursive modeling
a huge part of social interaction is a complicated thing that gets baked into lookup style reasoning
I’m not sure you actually disagree with the OP. I think you are probably right about the mechanism by which people identify and react to social situations.
I think the main claims of the OP hold whether you’re making hyper-fast calculations, or lookup checks. The lookup checks still correspond roughly to what the hyperfast calculations would be, and I read the OP mainly as a cautionary tale for people who attempt to do use System 2 reasoning to analyze social situations (and, especially, if you’re attempting to change social norms)
Aspiring rationalists are often the sort of people who look for inefficiencies in social norms and try to change them. But this often results in missing important pieces of all the nuances that System 1 was handling.
I rather like this concept, and probably put higher credence on it than you. However, I don’t think we are actually modeling that many layers deep. As far as I can tell, it’s actually rare to model even 1 layer deep. I think your hypothesis is close, but not quite there. We are definitely doing something, but I don’t think it can properly be described as modeling, at least in such fast-paced circumstances. It’s something close to modeling, but not quite it. It’s more like what a machine learning algorithm does, I think, and less like a computer simulation.
Models have moving parts, and diverge rapidly at points of uncertainty, like how others might react. When you build a model, it is a conscious process, and requires intelligent thought. The model takes world states as inputs, and simulates the effects these have on the components of the model. Then, after a bunch of time consuming computation, the model spits out a play-by-play of what we think will happen. If there are any points of uncertainty, the model will spit out multiple possibilities stemming from each, and build up multiple possible branches. This is extremely time consuming, and resource intense.
But there’s a fast, System 1 friendly way to route around needing a time-consuming model: just use a look-up-table.^[1] Maybe run the time consuming model a bunch of times for different inputs, and then mentally jot down the outputs for quick access later, on the fly. Build a big 2xn lookup table, with model inputs in 1 column, and results in the other. Do the same for every model you find useful. Maybe have 1 table for a friend’s preferences: inputting tunafish outputs gratitude (for remembering her preferences). Imputing tickling outputs violence.
Perhaps this is why we obsess over stressful situations, going over all the interpretations and everything we could have done differently. We’re building models of worrying situations, running them, and then storing the results for quick recall later. Maybe some of this is actually going on in dreams and nightmares, too.
But there’s another way to build a lookup table: directly from data, without running any simulation. I think we just naturally keep tabs of all sorts of things without even thinking about it. Arguably, most of our actions are being directed by these mental associations, and not by anything containing conscious models.
Here’s an example of what I think is going on, mentally:
Note that no actual modeling has occurred, and that all lookup tables used could have been generated purely experimentally, without ever consciously simulating anyone. This would explain why it’s so hard to explain the parts of our model when asked: we have no model, just heuristics, and fuzzy gut feeling about the situation. Running the model again would fill in some of the details we’ve forgotten, but takes a while to run, and slows down the conversation. That level of introspection is fine in an intimate, introspective conversation, but if it’s moving faster, the conversation will have changed topics by the time you’ve clarified your thoughts into a coherent model.
Most of the time though, I don’t think we even explicitly think about the moving parts that would be necessary to build a model. Take lying, for example:
We appear to be able to package extremely complex models up into a single function, so it seems unlikely that we are doing anything different with simpler models of things like lying. There’s no real difference in how complex the concept of god feels from the concept of a single atom or something, even though one has much more moving parts under the hood of the model. We’re not using any of the moving parts of the model, just spitting out cashed thoughts from a lookup table, so we don’t notice the difference.
If true, this has a bunch of other interesting implications:
This is likely also why people usually act first and pick a reason for that choice second: we don’t have a coherent model of the results until afterward anyway, so it’s impossible to act like an agent in real time. We can only do what we are already in the habit of doing, by following cashed TAPs. This is the reason behind akrasia, and the “elephant and rider” (System 1 and System 2) relationship.
Also note that this scales much better: you don’t need to know any causal mechanisms to build a lookup table, so you can think generally about how arbitrarily large groups will act based only on past experience, without needing to build it up from simulating huge numbers of individuals.
It implies that we are just Chinese Rooms most of the time, since conscious modeling is not involved most of the time. Another way of thinking of it is that we store keep the answers to the sorts of common computations we expect to do in (working?) memory, so that the more computationally intense consciousness can concentrate on the novel or difficult parts. Perhaps we could even expand our consciousness digitally to always recompute responses every time.
[1] For the record, I don’t think our minds have neat, orderly lookup tables. I think they use messy, associative reasoning, like the Rubes and Bleggs in How An Algorithm Feels From The Inside. This is what I’m referring to when I mention pattern matching, and each time I talk about looking something up in a empirically derived lookup table, a simulation input/results lookup table, or a TAP lookup table.
I think these sorts of central nodes with properties attached make up a vast, web-like networks, built like network 2 in the link. All the properties are themselves somewhat fuzzy, just like the central “rube”/”blegg” node. We could de-construct “cube” into constituent components the same way: 6 sides, all are flat, sharp corners, sharp edges, sides roughly 90 degrees apart, etc. You run into the same mental problems with things like rhombohedrons, and are forced to improve your sloppy default mental conception of cubes somehow if you want to avoid ambiguity.
All nodes are defined only by it’s relation to adjacent nodes, just like the central rube/blegg node. There are no labels attached to the nodes, just node clusters for words and sounds and letters attached to the thing they are meant to represent. It would be a graph theory monster if we tried to map it all out, but in principle you could do it by asking someone how strongly they associated various words and concepts.
I enthusiastically agree with you. I actually do machine learning as my day job, and its ability to store “lookup table” style mappings with generalization was exactly what I was thinking of when referring to “modeling”. I’m pleased I pointed to the right concept, and somewhat disappointed that my writing wasn’t high enough quality to clarify this from the beginning. what you mention about obsessing seems extremely true to me, and seems related to Satvik’s internationalization of it as “rapid fire simulations”.
in general I think of s1 as “fast lookup-table style reasoning” and s2 as “cpu-and-program style reasoning”. my goal here was to say:
humans have a hell of a lot of modeling power in the fast lookup style of reasoning
that style of reasoning can embed recursive modeling
a huge part of social interaction is a complicated thing that gets baked into lookup style reasoning
^•^
I’m not sure you actually disagree with the OP. I think you are probably right about the mechanism by which people identify and react to social situations.
I think the main claims of the OP hold whether you’re making hyper-fast calculations, or lookup checks. The lookup checks still correspond roughly to what the hyperfast calculations would be, and I read the OP mainly as a cautionary tale for people who attempt to do use System 2 reasoning to analyze social situations (and, especially, if you’re attempting to change social norms)
Aspiring rationalists are often the sort of people who look for inefficiencies in social norms and try to change them. But this often results in missing important pieces of all the nuances that System 1 was handling.