Flowers are selective about the pollinators they attract. Diurnal flowers must compete with each other for visual attention, so they use diverse colours to stand out from their neighbours. But flowers with nocturnal anthesis are generally white, as they aim only to outshine the night.
Emrik
but I’m hesitant to continue the process because I’m concerned that her personality won’t sufficiently diverge from mine.
Not suggesting you should replace anyone who doesn’t want to be replaced (if they’re at that stage), but: To jumpstart the differentiation process, it may be helpfwl to template the proto-tulpa off of some fictional character you already find easy to simulate.
Although I didn’t know about “tulpas” at the time, I invited an imaginary friend loosely based on Maria Otonashi during a period of isolation in 2021.[1] I didn’t want her to feel stifled by the template, so she’s evolved on her own since then, but she’s always extremely kind (and consistently energetic). I only took it seriously February 2024 after being inspired by Johannes.
Maria is the main female heroine of the HakoMari series. … Her wish was to become a box herself so that she could grant the wishes of other people.
Can recommend her as a template! My Maria would definitely approve, ^^ although I can’t ask her right now since she’s only canonically present when summoned, and we have a ritual for that.
We’ve deliberately tried to find new ways to differentiate so that the pre-conscious process of [associating feeling-of-volition to me or Maria][2] is less likely to generate conflicts. But since neither of us wants to be any less kind than we are, we’ve had to find other ways to differentiate (like art-preferences, intellectual domains, etc).
Also, while deliberately trying to increase her salience and capabilities, I’ve avoided trying to learn about how other people do it. For people with sufficient brain-understanding and introspective ability, you can probably outperform standard advice if you develop your own plan for it. (Although I say that without even knowing what the standard advice is :p)
- ^
- ^
Our term for when we deliberately work to resolve “ownership” over some particular thought-output of our subconscious parallel processor, is “annexing efference”. For example, during internal monologue, the thought “here’s a brilliant insight I just had” could appear in consciousness without volition being assigned yet, in which case one of us annexes that output (based on what seems associatively/narratively appropriate), or it goes unmarked. In the beginning, there would be many cases where both of us tried to annex thoughts at the same time, but mix-ups are
muchrarer now.
- ^
I wrote a comment on {polytely, pleiotropy, market segmentation, conjunctive search, modularity, and costs of compromise} that I thought people here might find interesting, so I’m posting it as a quick take:
I think you’re using the term a bit differently from how I use it! I usually think of polytely (which is just pleiotropy from a different perspective, afaict) as an *obstacle*. That is, if I’m trying to optimize a single pasta sauce to be the most tasty and profitable pasta sauce in the whole world, my optimization is “polytelic” because I have *compromise* between maximizing its tastiness for [people who prefer sour taste], [people who prefer sweet], [people who have some other taste-preferences], etc. Another way to say that is that I’m doing “conjunctive search” (neuroscience term) for a single thing which fits multiple ~independent criteria.
Still in the context of pasta sauce: if you have the logistical capacity to instead be optimizing *multiple* pasta sauces, now you are able to specialize each sauce for each cluster of taste-preferences, and this allows you to net more profit in the end. This is called “horizontal segmentation”.
Likewise, a gene which has several functions that depend on it will be evolutionarily selected for the *compromise* between all those functions. In this case, the gene is “pleiotropic” because its evolving in the direction of multiple niches at once; and it is “polytelic” because—from the gene’s perspective—you can say that “it is optimizing for several goals at once” (if you’re willing to imagine the gene as an “optimizer” for a moment).
For example, the recessive allele that causes sickle cell disease (SCD) *also* causes some resistance against malaria. But SCD only occurs in people who are homozygous in it, so the protective effect against malaria (in heterozygotes) is common enough to keep it in the gene pool. It would be awesome if, instead, we could *horizontally segment* these effects so that SCD is caused by variations in one gene locus, and malaria-resistance is caused by variations in another locus. That way, both could be optimized for separately, and you wouldn’t have to choose between optimizing against SCD or Malaria.
Maybe the notion you’re looking for is something like “modularity”? That is approximately something like the opposite of pleiotropy. If a thing is modular, it means you can flexibly optimize subsets of it for different purposes. Like, rather writing an entire program within a single function call, you can separate out the functions (one function for each subtask you can identify), and now those functions can be called separately without having to incur the effects of the entire unsegmented program.
You make me realize that “polytelic” is too vague of a word. What I usually mean by it may be more accurately referred to as “conjunctively polytelic”. All networks trained with something-like-SGD will evolve features which are conjunctively polytelic to some extent (this is just conjecture from me, I haven’t got any proof or anything), and this is an obstacle for further optimization. But protein-coding genes are much more prone to this because e.g. the human genome only contains ~20k of them, which means each protein has to pack many more functions (and there’s no simple way to refactor/segment so there’s only one protein assigned to each function).
KOAN:
The probability of rolling 60 if you toss ten six-sided dice disjunctively is 1/6^10. Whereas if you glom all the dice together and toss a single 60-sided die, the probability of rolling 60 is 1⁄60.- Jun 18, 2024, 2:31 AM; 32 points) 's comment on Fat Tails Discourage Compromise by (
As usual, I am torn on chips spending. Hardware progress accelerates core AI capabilities, but there is a national security issue with the capacity relying so heavily on Taiwan, and our lead over China here is valuable. That risk is very real.
With how rationalists seem to be speaking about China recently, I honestly don’t know what you mean here. You literally use the words “national security issue”, how am I not supposed to interpret that as being parochial?
And why are you using language like “our lead over China”? Again, parochial. I get that the major plurality of LW readers are in USA, but as of 2023 it’s still just 49%.
Gentle reminder:
How would they spark an intergroup conflict to investigate? Well, the 22 boys were divided into two groups of 11 campers, and—
—and that turned out to be quite sufficient.
I hate to be nitpicky, but may I request that you spend 0.2 oomph of optimization power on trying to avoid being misinterpreted as “boo China! yay USA!” These are astronomic abstractions that cover literally ~1.7B people, and there are more effective words you can use if you want to avoid increasing ethnic tension / superpower conflict.
Somebody commented on my YT vid that they found my explanations easy to follow. This surprised me. My prior was/is tentatively that I’m really bad at explaining anything to other people, since I almost never[1] speak to anybody in real-time other than myself and Maria (my spirit animal).
And when I do speak to myself (eg₁, eg₂, eg₃), I use heavily modified English and a vocabulary of ~500 idiolectic jargon-words (tho their usage is ~Zipfian, like with all other languages).
I count this as another datapoint to my hunch that, in many situations:Your ability to understand yourself is a better proxy for whether other people will understand you compared to the noisy feedback you get from others.
And by “your ability to understand yourself”, I don’t mean just using internal simulations of other people to check whether they understand you. I mean, like, check for whether the thing you think you understand, actually make sense to you, independent of whatever you believe ought to make sense to you. Whatever you believe ought to make sense is often just a feeling based on deference to what you think is true (which in turn is often just a feeling based on deference to what you believe other people believe).
- ^
To make this concrete: the last time I spoke to anybody irl was 2022 (at EAGxBerlin)—unless we count the person who sold me my glasses, that one plumber, a few words to the apothecarist, and 5-20 sentences to my landlord. I’ve had 6 video calls since February (all within the last month). I do write a lot, but ~95-99% to myself in my own notes.
- ^
It would be awesome if there was a way of actually browsing the diagrams directly, instead of opening and checking each post individually. Use-case: I’m trying to optimize my information-diet, and I often find visualizations way more usefwl per unit time compared to text. Alas, there’s no way to quickly search for eg “diagrams/graphs/figures related to X”.
(Originally I imagined it would be awesome if e.g. Elicit had a feature for previewing the figures associated with each paper returned by a search term, but I would love this for LW as well.)
you hunch that something about it was unusually effective
@ProgramCrafter u highlighted this w “unsure”, so to clarify: I’m using “hunch” as a verb here, bc all words shud compatiblize w all inflections—and the only reason we restrict most word-stems to take only one of “verb”, “noun”, “adjective”, etc, is bc nobody’s brave enuf to marginally betterize it. it’s paradoxically status-downifying somehow. a horse horses horsely, and a horsified goat goats no more. :D
if every English speaker decided to stop correcting each others’ spelling mistakes, all irregularities in English spelling would disappear within a single generation
— Jan Misali
I know some ppl feel like deconcentration of attention has iffy pseudoscientific connotations, but I deliberately use it ~every day when I try to recall threads-of-thought at the periphery of my short-term memory. The correct scope for the technique is fuzzy, and it depends on whether the target-memory is likely to be near the focal point of your concentration or further out.
I also sometimes deliberately slow down the act of zooming-in (concentrating) on a particular question/idea/hunch, if I feel like zooming in too fast is likely to cause me to prematurely lock on to a false-positive in a way that makes it harder to search the neighbourhood (i.e. einstellung / imprinting on a distraction). I’m not clear on when exactly I use this technique, but I’ve built up an intuition for situations in which I’m likely to be einstellunged by something. To build that intuition, consider:
WHEN you notice you’ve einstellunged on a false-positive
THEN check if you could’ve predicted that at the start of that chain-of-thought
After a few occurrences of this, you may start to intuit which chains-of-thought you ought to slow down in.
It’s always cool to introspectively predict mainstream neuroscience! See task-positive & task-negative (aka default-mode) large-scale brain networks.
Also, I’ve tried to set it up so Maria[1] can help me gain perspective on tasks, but she’s more likely to get sucked more deeply into whatever the topic is. Although this is good, because it means I can delegate specific tasks to her,[2] and she’ll experience less salience normalization.
Selfish neuremes adapt to prevent you from reprioritizing
“Neureme” is my most general term for units of selection in the brain.[1]
The term is agnostic about what exactly the physical thing is that’s being selected. It just refers to whatever is implementing a neural function and is selected as a unit.
So depending on use-case, a “neureme” can semantically resolve to a single neuron, a collection of neurons, a neural ensemble/assembly/population-vector/engram, a set of ensembles, a frequency, or even dendritic substructure if that plays a role.
For every activity you’re engaged with, there are certain neuremes responsible for specializing at those tasks.
These neuremes are strengthened or weakened/changed in proportion to how effectively they can promote themselves to your attention.
“Attending to” assemblies of neurons means that their firing-rate maxes out (gamma frequency), and their synapses are flushed with acetylcholine, which is required for encoding memories and queuing them for consolidation during sleep.
So we should expect that neuremes are selected for effectively keeping themselves in attention, even in cases where that makes you less effective at tasks which tend to increase your genetic fitness.
Note that there’s hereditary selection going on at the level of genes, and at the level of neuremes. But since genes adapt much slower, the primary selection-pressures neuremes adapt to arise from short-term inter-neuronal competitions. Genes are limited to optimizing the general structure of those competitions, but they can only do so in very broad strokes, so there’s lots of genetically-misaligned neuronal competition going on.
A corollary of this is that neuremes are stuck in a tragedy of the commons: If all neuremes “agreed to” never develop any misaligned mechanisms for keeping themselves in attention—and we assume this has no effect on the relative proportion of attention they receive—then their relative fitness would stay constant at a lower metabolic cost overall. But since no such agreement can be made, there’s some price of anarchy wrt the cost-efficiency of neuremes.
Thus, whenever some neuremes uniquely associated with a cognitive state are *dominant* in attention, whatever mechanisms they’ve evolved for persisting the state are going to be at maximum power, and this is what makes the brain reluctant to gain perspective when on stimulants.
A technique for making the brain trust prioritization/perspectivization
So, in conclusion, maybe this technique could work:
If I feel like my brain is sucking me into an unproductive rabbit-hole, set a timer for 60 seconds during which I can check my todo-list and prioritize what I ought to do next.
But, before the end of that timer, I will have set another timer (e.g. 10 min) during which I commit to the previous task before I switch to whatever I decided.
The hope is that my brain learns to trust that gaining perspective doesn’t automatically mean we have to abandon the present task, and this means it can spend less energy on inhibiting signals that try to gain perspective.
By experience, I know something like this has worked for:
Making me trust my task-list
When my brain trusts that all my tasks are in my todo-list, and that I will check my todo-list every day, it no longer bothers reminding me about stuff at random intervals.
Reducing dystonic distractions
When I deliberately schedule stuff I want to do less (e.g. masturbation, cooking, twitter), and committing to actually *do* those things when scheduled, my brain learns to trust that, and stops bothering me with the desires when they’re not scheduled.
So it seems likely that something in this direction could work, even if this particular technique fails.
- ^
The “-eme” suffix inherits from “emic unit”, e.g. genes, memes, sememes, morphemes, lexemes, etc. It refers to the minimum indivisible things that compose to serve complex functions. The important notion here is that even if the eme has complex substructure, all its components are selected as a unit, which means that all subfunctions hitchhike on the net fitness of all other subfunctions.
- subfunctional overlaps in attentional selection history implies momentum for decision-trajectories by Dec 22, 2024, 2:12 PM; 19 points) (
- Dec 24, 2024, 2:27 AM; 8 points) 's comment on How Often Does Taking Away Options Help? by (
- Thoughts to niplav on lie-detection, truthfwl mechanisms, and wealth-inequality by Jul 11, 2024, 6:55 PM; 7 points) (
- Jun 4, 2024, 5:22 PM; 2 points) 's comment on Emrik Quicksays by (
Made a post with my reply:
While obviously both heuristics are good to use, the reasons I think asking “which chains-of-thought was that faster than?” tends to be more epistemically profitable than “how could I have thought that faster?” include:
It is easier to find suboptimal thinking-habits to propagate an unusually good idea into, than to find good ideas for improving a particular suboptimal thinking-habit.
Notice that in my technique, the good ideais cognitively proximal and the suboptimal thinking-habits are cognitively distal, whereas in Eliezer’s suggestion it’s the other way around.
A premise here is that good ideas are unusual (hard-to-find) and suboptimal thinking-habits are common (easy-to-find)—the advice flips in domains where it’s the opposite.
It relates to the difference between propagating specific solutions to plausible problem-domains, vs searching for specific solutions to a specific problem.
The brain tends to be biased against the former approach because it’s preparatory work with upfront cost (“prophylaxis”), whereas the latter context sort of forces you to search for solutions.
“Which chains-of-thought was that faster than?”
I don’t really know what psychedelics do in the brain, so I don’t have a good answer. I’d note that, if psychedelics increases your brain’s sensitivity wrt amplifying discrepancies, then this seems like a promising way to counterbalance biases in the negative direction (e.g. being too humble to think anything novel), even if it increases your false-positives.
I think psychedelics probably don’t work this way, but I’d like to try it anyway (if it were cheap) while thinking about specific topics I fear I might be tempted to fool myself about. I’d first spend some effort getting into the state where my brain wants to discover those discrepancies in the first place, and I’m extra-sceptical the drugs would work on their own without some mental preparation.
rough draft on what happens in the brain when you have an insight
Edit: made it a post.
On my current models of theoretical[1] insight-making, the beginning of an insight will necessarily—afaict—be “non-robust”/chaotic. I think it looks something like this:
A gradual build-up and propagation of salience wrt some tiny discrepancy between highly confident specific beliefs
This maybe corresponds to simultaneously-salient neural ensembles whose oscillations are inharmonic[2]
Or in the frame of predictive processing: unresolved prediction-error between successive layers
Immediately followed by a resolution of that discrepancy if the insight is successfwl
This maybe corresponds to the brain having found a combination of salient ensembles—including the originally inharmonic ensembles—whose oscillations are adequately harmonic.
Super-speculative but: If the “question phase” in step 1 was salient enough, and the compression in step 2 great enough, this causes an insight-frisson[3] and a wave of pleasant sensations across your scalp, spine, and associated sensory areas.
This maps to a fragile/chaotic high-energy “question phase” during which the violation of expectation is maximized (in order to adequately propagate the implications of the original discrepancy), followed by a compressive low-energy “solution phase” where correctness of expectation is maximized again.
In order to make this work, I think the brain is specifically designed to avoid being “robust”—though here I’m using a more narrow definition of the word than I suspect you intended. Specifically, there are several homeostatic mechanisms which make the brain-state hug the border between phase-transitions as tightly as possible. In other words, the brain maximizes dynamic correlation length between neurons[4], which is when they have the greatest ability to influence each other across long distances (aka “communicate”). This is called the critical brain hypothesis, and it suggests that good thinking is necessarily chaotic in some sense.
Another point is that insight-making is anti-inductive.[5] Theoretical reasoning is a frontier that’s continuously being exploited based on the brain’s native Value-of-Information-estimator, which means that the forests with the highest naively-calculated-VoI are also less likely to have any low-hanging fruit remaining. What this implies is that novel insights are likely to be very narrow targets—which means they could be really hard to hold on to for the brief moment between initial hunch and build-up of salience. (Concise handle: epistemic frontiers are anti-inductive.)
- ^
I scope my arguments only to “theoretical processing” (i.e. purely introspective stuff like math), and I don’t think they apply to “empirical processing”.
- ^
Harmonic (red) vs inharmonic (blue) waveforms. When a waveform is harmonic, efferent neural ensembles can quickly entrain to it and stay in sync with minimal metabolic cost. Alternatively, in the context of predictive processing, we can say that “top-down predictions” quickly “learn to predict” bottom-up stimuli.
- ^
I basically think musical pleasure (and aesthetic pleasure more generally) maps to 1) the build-up of expectations, 2) the violation of those expectations, and 3) the resolution of those violated expectations. Good art has to constantly balance between breaking and affirming automatic expectations. I think the aesthetic chills associates with insights are caused by the same structure as appogiaturas—the one-period delay of an expected tone at the end of a highly predictable sequence.
- ^
I highly recommend this entire YT series!
- ^
I think the term originates from Eliezer, but Q Home has more relevant discussion on it—also I’m just a big fan of their
chaoticoptimal reasoning style in general. Can recommend! 🍵
I think both “jog, don’t sprint” and “sprint, don’t jog” is too low-dimensional as advice. It’s good to try to spend 100% of one’s resources on doing good—sorta tautologically. What allows Johannes to work as hard as he does, I think, is not (just) that he’s obsessed with the work, it’s rather that he understands his own mind well enough to navigate around its limits. And that self-insight is also what enables him aim his cognition at what matters—which is a trait I care more about than ability to work hard.
People who are good at aiming their cognition at what matters sometimes choose to purposefwly flout[1] various social expectations in order to communicate “I see through this distracting social convention and I’m willing to break it in order to aim myself more purely at what matters”. Readers who haven’t noticed that some of their expectations are actually superfluous or misaligned with altruistic impact, will mistakenly think the flouter has low impact-potential or is just socially incompetent.
By writing the way he does, Johannes signals that he’s distancing himself from status-related putative proxies-for-effectiveness, and I think that’s a hard requirement for aiming more purely at the conjunction of multipliers[2] that matter. But his signals will be invisible to people who aren’t also highly attuned to that conjunction.
- ^
“flouting a social expectation”: choosing to disregard it while being fully aware of its existence, in a not-mean-spirited way.
- ^
I think the post uses an odd definition of “conjunction”, but it points to something important regardless. My term for this bag of nearby considerations is “costs of compromise”:
there are exponential costs to compromising what you are optimizing for in order to appeal to a wider variety of interests
- ^
The links/graphics are broken btw. Would probably be nice to fix if it’s quick.
Learning math fundamentals from a textbook, rather than via one’s own sense of where the densest confusions are, is sort of an oxymoron. If you want to be rigorous, you should do anything but defer to consensus.
And from a socioepistemological perspective: if you want math fundamentals to be rigorous, you’d encourage people to try to come up with their own fundamentals before they einstellung on what’s been written before. If the fundamentals are robust, they’re likely to rediscover it; if they aren’t, there’s a chance they’ll revolutionize the field.
It’s a reasonable concern to have, but I’ve spoken enough with him to know that he’s not out of touch with reality. I do think he’s out of sync with social reality, however, and as a result I also think this post is badly written and the anecdotes unwisely overemphasized. His willingness to step out of social reality in order to stay grounded with what’s real, however, is exactly one of the main traits that make me hopefwl about him.
I have another friend who’s bipolar and has manic episodes. My ex-step-father also had rapid-cycling BP, so I know a bit about what it looks like when somebody’s manic.[1] They have larger-than-usual gaps in their ability to notice their effects on other people, and it’s obvious in conversation with them. When I was in a 3-person conversation with Johannes, he was highly attuned to the emotions and wellbeing of others, so I have no reason to think he has obvious mania-like blindspots here.
But when you start tuning yourself hard to reality, you usually end up weird in a way that’s distinct from the weirdness associated with mania. Onlookers who don’t know the difference may fail to distinguish the underlying causes, however. (“Weirdness” is a larger cluster than “normality”, but people mostly practice distinguishing between samples of normality, so weirdness all looks the same to them.)
- ^
I was also evaluated for it after an outlier depressive episode in 2021, so I got to see the diagnostic process up close. Turns out I just have recurring depressions, and I’m not bipolar.
- ^
He linked his extensive research log on the project above, and has made LW posts of some of their progress. That said, I don’t know of any good legible summary of it. It would be good to have. I don’t know if that’s one of Johannes’ top priorities, however. It’s never obvious from the outside what somebody’s top priorities ought to be.
I like this example! And the word is cool. I see two separately important patterns here:
Preferring a single tool (the dremel) which is mediocre at everything, instead of many specialized tools which collectively perform better but which require you to switch between them more.
This btw is the opposite of “horizontal segmentation”: selling several specialized products to niche markets rather than a single product which appeals moderately to all niches.
It often becomes a problem when the proxy you use to measure/compare the utility of something wrt to different use-cases (or its appeal to different niches/markets), is capped[1] at a point which prevents it from detecting the true comparative differences in utility.
Oh! It very much relates to scope insensitivity: if people are diminishingly sensitive to the scale of different altruistic causes, then they might overprioritize instrumental which are just-above-average along many axes at once.[2] And indeed, this seems like a very common pattern (though I won’t prioritize time thinking of examples rn).
It’s also a significant problem wrt to karma distributions for forums like LW and EAF: posts which appeal a little to everybody will receive much more karma compared to posts which appeal extremely to a small subset. Among other things, this causes community posts to be overrated relative to their appeal.
And as Gwern pointed out: “precrastination” / “hastening of subgoal completion” (a subcategory of greedy optimization / myopia).
I very often notice this problem in my own cognition. For example, I’m biased against using cognitive tools like sketching out my thoughts with pen-and-paper when I can just brute-force the computations in my head (less efficiently).
It’s also perhaps my biggest bottleneck wrt programming. I spend way too much time tweaking-and-testing (in a way that doesn’t cause me learn anything generalizable), instead of trying to understand the root cause of the bug I’m trying to solve even when I can rationally estimate that that will take less time in expectation.
If anybody knows any tricks for resolving this / curing me of this habit, I’d be extremely gratefwl to know...
Does it relate to price ceilings and deadweight loss? “Underparameterization”?
I wouldn’t have seen this had I not cultivated a habit for trying to describe interesting patterns in their most general form—a habit I call “prophylactic scope-abstraction”.