Pretending not to see when a rule you’ve set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes).
Example: suppose you have a toddler and a “rule” that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There’s cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more painful than it’s worth in that moment (meaning, fully discounting future consequences). If you fail to enforce the rule, you undermine your authority which results in your toddler fighting future enforcement (of this and possibly all other rules!) much harder, as he realizes that the rule is in fact negotiable / flexible.
However, you have a third choice, which is to credibly pretend to not see that he’s doing it. It’s true that this will undermine your perceived competence, as an authority, somewhat. However, it does not undermine the perception that the rule is to be fully enforced if only you noticed the violation. You get to “skip” a particularly costly enforcement, without taking steps back that compromise future enforcement much.
I bet this happens sometimes in classrooms (re: disruptive students) and prisons (re: troublesome prisoners) and regulation (re: companies that operate in legally aggressive ways).
Of course, this stops working and becomes a farce once the pretense is clearly visible. Once your toddler knows that sometimes you pretend not to see things to avoid a fight, the benefit totally goes away. So it must be used judiciously and artfully.
Huh, that went somewhere other than where I was expecting. I thought you were going to say that ignoring letter-of-the-rule violations is fine when they’re not spirit-of-the-rule violations, as a way of communicating the actual boundaries.
Perhaps that can work depending on the circumstances. In the specific case of a toddler, at the risk of not giving him enough credit, I think that type of distinction is too nuanced. I suspect that in practice this will simply make him litigate every particular application of any given rule (since it gives him hope that it might work) which raises the cost of enforcement dramatically. Potentially it might also make him more stressed, as I think there’s something very mentally soothing / non-taxing about bright line rules.
I think with older kids though, it’s obviously a really important learning to understand that the letter of the law and the spirit of the law do not always coincide. There’s a bit of a blackpill that comes with that though, once you understand that people can get away with violating the spirit as long as they comply with the letter, or that complying with the spirit (which you can grok more easily) does not always guarantee compliance with the letter, which puts you at risk of getting in trouble.
(sci-fi take?) If time travel and time loops are possible, would this not be the (general sketch of the) scenario under which it comes into existence:
1. a lab figures out some candidate particles that could be sent back in time, build a detector for them and start scanning for them. suppose the particle has some binary state. if the particle is +1 (-1) the lab buys (shorts) stock futures and exits after 5 minutes
2. the trading strategy will turn out to be very accurate and the profits from the trading strategy will be utilized to fund the research required to build the time machine
3. at some arbitrary point in the future, eventually, the r&d and engineering efforts are successful. once the device is built, the lab starts sending information back in time to tip itself to future moves in stock futures (the very same particles it originally received). this closes the time loop and guarantees temporal consistency
Reasons why this might not happen:
time doesn’t work like this, or time travel / loops aren’t possible
civilization doesn’t survive long enough to build the device
the lab can’t commit to using its newfound riches to build the device, breaking the logic and preventing the whole thing from working in the first place
4. The lab sees milions of conflicting particles. Their plan was foiled by future spam. 5. The lab is killed by assasins, who’ve been waiting for the signal since the 12th century and get activated by the Time Control Authority.
Infertility rates are rising and nobody seems to quite know why. Below is what feels like a possible (trivial) explanation that I haven’t seen mentioned anywhere.
a family is either fertile or infertile, and fertility is hereditary
the modal fertile family can have up to 10 kids, the modal infertile family can only have 2 kids
in the olden days families aimed to have as many kids as they could
now families aim to have 2 kids each
Under this model, in the olden days we would find a high proportion of fertile people in the gene pool, but in the modern world we wouldn’t. Put differently, the old convention lead to a strong positive correlation between fertility and participation in the gene pool, and the new convention leads to 0 correlation. This removes the selective pressure on fertility, hence we should expect fertility to drop / infertility to rise.
Empirical evidence for this would be something like an analysis of the time series of family size variance and infertility—is lower variance followed by increased infertility?
Regarding female fertility, this report from Norway outlines the trend that I vaguely thought was representative of most of the developed world over the last 100 years.
Female fertility is trickier to measure, since female fertility and age are strongly correlated, and women have been having kids later, so it’s important (and likely tricky) to disentangle this confounder from the data.
I often mistakenly behave as if my payoff structure is binary instead of gradual. I think others do too, and this cuts across various areas.
For instance, I might wrap up my day and notice that it’s already 11:30pm, though I’d planned to go to sleep an hour earlier, by 10:30pm. My choice is, do I do a couple of me-things like watch that interesting YouTube video I’d marked as “watch later”, or do I just go to sleep ASAP? I often do the former and then predictably regret it the next day when I’m too tired to function well. I’ve reflected on what’s going on in my mind (with the ultimate goal of changing my behavior) and I think the simplest explanation is that I behave as if the payoff curve, in this case of length of sleep, is binary rather than gradual. Rational decision-making would prescribe that, especially once you’re getting less rest than you need, every additional hour of sleep is worth more rather than less. However, I suspect my instinctive thought process is something like “well, I’ve already missed my sleep target even if I go to sleep ASAP, so might as well watch a couple of videos and enjoy myself a little since my day tomorrow is already shot.”
This is pretty terrible! It’s the opposite of what I should be doing!
Maybe something like this is going on when poor people spend a substantial fraction of their income on the lottery (I’m already poor and losing an extra $20 won’t change that, but if I win I’ll stop being poor, so let me try) or when people who are out of shape choose not to exercise (I’m already pretty unhealthy and one 30-minute workout won’t change that, so why waste my time.) or when people who have a setback in their professional career have trouble picking themselves back up (my story is not going to be picture perfect anyway, so why bother.)
It would be good to have some kind of mental reframing to help me avoid this prectictably regrettable behavior.
I think you’re probably quite correct about that example, and similar things. I notice other people doing this a lot, and I catch myself at it sometimes. So I think noticing and eliminating this particular flaw in logic is helpful.
I also think the underlying problem goes deeper. Because we want to stay up and watch that video, our brain will come up with excuses to do it, and we’ll be biased to just quickly accept those excuses when we otherwise would recognize them as logically flawed, because we want to.
This is motivated reasoning. I think it’s the single most impactful and pervasive bias. I spent some years studying this, but I haven’t yet gotten around to writing about it on LW because it’s not directly alignment-relevant. I really need to do at least a short post, because it is relevant for basically navigating and understanding all psychology. Including the field of alignment research.
This raises the question of what it means to want to do something, and who exactly (or which cognitive system) is doing the wanting.
Of course I do want to keep watching YT, but I also recognize there’s a cost to it. So on some level, weighing the pros and cons, I (or at least an earlier version of me) sincerely do want to go to bed by 10:30pm. But, in the moment, the tradeoffs look different from how they appeared from further away, and I make (or, default into) a different decision.
An interesting hypothetical here is whether I’d stay up longer when play time starts at 11:30pm than when play time starts at, say, 10:15pm (if bedtime is 10:30pm). The wanting to play, and the temptation to ignore the cost, might be similar in both scenarios. But this sunk cost / binary outcome fallacy would suggest that I’ll (marginally) blow further past my deadline in the former situation than in the latter.
I recognize a very similar failure mode of instrumental rationality: I sometimes include in the decision process for an action not just the utility of that action itself, but also its probability. That is, I act on the expected utility of the action, not on its utility. Example:
I should hurry up enough to catch my train (hurrying up enough has high utility)
Based on experience, I probably won’t hurry up enough (hurrying up enough has low probability)
So the expected utility (utility*probability) of hurrying up enough is not very high
So I don’t hurry up enough
So I miss my train.
The mistake is to pay any attention to the expected utility (utility*probability) of an action, rather than just to its utility. The probability of what I will do is irrelevant to what I should do. The probability of an action should be the output, never the input of my decision. If one action has the highest utility, it should go to 100% probability (that is, I should do it) and all the alternative actions should go to 0 probability.
The scary thing is that recognizing this mistake doesn’t help with avoiding it.
Proposal: if you’re a social media or other content based platform, add a long-press to the “share” button which allows you to choose between “hate share” and “love share”.
Therefore: * quick tap: keep the current functionality, you get to send the link wherever / copy to clipboard * long press and swipe to either hate or love share: you still get to send the link (optionally, the URL has some argument indicating it’s a hate / love share, if the link is a redirect through the social media platform)
This would allow users to separate out between things that are worth sharing but that they hate / love and want to see less / more of, and it might defang the currently powerful strategy (with massive negative social externalities) of generating outrage content just to get more shares.
Social media companies can, in turn, then use this to dial back the viraility of hate share vs love share content, if they choose to do so.
I believe that there is already far too much “hate sharing”.
Perhaps the default in a social media UI should be that shared content includes a public endorsement of whatever content it links to, and if you want to “hate share” anything without such an endorsement, you have to fight a hostile UI to do so.
In particular, “things that are worth sharing” absolutely should not overlap with “want to see less of”. If you want to see less of some type of thing, it’s self-defeating to distribute more copies of it. Worse, if you even suspect that any of your own readers are anything like you, why are you inflicting it on them?
One way to “see less of” something you hate is to stop it from being produced, and that may be seen as a better solution than basically averting your eyes from it. Rallying mobs to get whoever produced it fired has proven to be quite effective.
The more complex the encoding of a system (e.g. of ethics) is, the more likely it is that it’s reverse-engineered in some way. Complexity is a marker of someone working backwards to encapsulate messy object-level judgment into principles. Conversely, a system that flows outward from principles to objects will be neatly packed in its meta-level form.
In linear algebra terms, as long as the space of principles has fewer dimensions than the space of objects, we expect principled systems / rules to have a low-rank representation, with a dimensionality approaching that of the space of principles and far below that of the space of objects.
As a corrolary, perhaps we are justified in being more suspicious of complex systems over simple ones, since they come with a higher risk that the systems are “insincere”, in the sense that they were deliberately created with the purpose of justifying a particular outcome rather than being genuine and principled.
This rhymes with Occam’s razor, and also with some AI safety approaches which planned to explore whether dishonesty is more computationally costly than honesty.
Does this mean that meta-level systems are memetically superior, since their informational payloads are smaller? The success of Abrahamic religions (which mostly compress neatly into 10-12 commandments) might agree with this.
I don’t know of any encodings or legibile descriptions of ethics that AREN’T reverse-engineered. Unless you’re a moral realist, I suspect this has to be the case, because such systems are in the map, not the territory. And not even in the most detailed maps, they’re massively abstracted over other abstractions.
I’m far more suspicious of simple descriptions, especially when the object space has many more dimensions. The likelihood that they’ve missed important things about the actual behavior/observations is extremely high.
Agreed that ultimately everything is reverse-engineered, because we don’t live in a vacuum. However, I feel like there’s a meaningful distinction between: 1. let me reverse engineer the principles that best describe our moral intuition, and let me allow parsimonious principles to make me think twice about the moral contradictions that our actual behavior often implies, and perhaps even allow my behavior to change as a result 2. let me concoct a set of rules and exceptions that will justify the particular outcome I want, which is often the one that best suits me
For example, consider the contrast between “we should always strive to treat others fairly” and “we should treat others fairly when they are more powerful than us, however if they are weaker let us then do to them whatever is in our best interest whether or not it is unfair, while at the same time paying lip service to fairness in hopes that we cajole those more powerful than us into treating us fairly”. I find the former a less corrupted piece of moral logic than the latter even though the latter arguably describes actual behavior fairly well. The former compresses more neatly, which isn’t a coincidence.
There’s something of a [bias-variance tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff) here. The smaller the moral model, the less expressive it can be (so the more nuance it misses), but the more helpful it will be on future, out-of-distribution questions.
Agreed that ultimately everything is reverse-engineered, because we don’t live in a vacuum.
My point was not that we don’t live in a vacuum, but that there’s no ground truth or “correct” model. We’re ONLY extrapolating from very limited experienced examples, not understanding anything fundamental.
For example, consider the contrast between “we should always strive to treat others fairly” and “we should treat others fairly when they are more powerful than us, however if they are weaker let us then do to them whatever is in our best interest whether or not it is unfair, while at the same time paying lip service to fairness in hopes that we cajole those more powerful than us into treating us fairly”.
When you see the word “should”, you know you’re in preferences and modeling land, right?
Causality is rare! The usual statement that “correlation does not imply causation” puts them, I think, on deceptively equal footing. It’s really more like correlation is almost always not causation absent something strong like an RCT or a robust study set-up.
Over the past few years I’d gradually become increasingly skeptical of claims of causality just by updating on empirical observations, but it just struck me that there’s a good first principles reason for this.
For each true cause of some outcome we care to influence, there are many other “measurables” that correlate to the true cause but, by default, have no impact on our outcome of interest. Many of these measures will (weakly) correlate to the outcome though, via their correlation to the true cause. So there’s a one-to-many relationship between the true cause and the non-causal correlates. Therefore, if all you know is that something correlates with a particular outcome, you should have a strong prior against that correlation being causal.
My thinking previously was along the lines of p-hacking: if there are many things you can test, some of them will cross a given significance threshold by chance alone. But I’m claiming something more specific than that: any true cause is bound to be correlated to a bunch of stuff, which will therefore probably correlate with our outcome of interest (though more weakly, and not guaranteed since correlation is not necessarily transitive).
The obvious idea of requiring a plausible hypothesis for the causation helps somewhat here, since it rules out some of the non-causal correlates. But it may still leave many of them untouched, especially the more creative our hypothesis formation process is! Another (sensible and obvious, that maybe doesn’t even require agreement with the above) heuristic is to distrust small (magnitude) effects, since the true cause is likely to be more strongly correlated with the outcome of interest than any particular correlate of the true cause.
Compilation of studies comparing observational results with randomized experimental results on the same intervention, compiled from medicine/economics/psychology, indicating that a large fraction of the time (although probably not a majority) correlation ≠ causality.
Those are not randomly selected pairs, however. There are 3 major causal patterns: A->B, A<-B, and A<-C->B. Daecaneus is pointing out that for a random pair of correlations of some variables, we do not assign a uniform prior of 33% to each of these. While it may sound crazy to try to argue for some specific prior like ‘we should assign 1% to the direct causal patterns of A->B and A<-B, and 99% to the confounding pattern of A<-C->B’, this is a lot closer to the truth than thinking that ‘a third of the time, A causes B; a third of the time, B causes A; and the other third of the time, it’s just some confounder’.
For example, only children are nearly twice as likely to be Presbyterian than Baptist in Minnesota, more than half of the Episcopalians “usually like school” but only 45% of Lutherans do, 55% of Presbyterians feel that their grades reflect their abilities as compared to only 47% of Episcopalians, and Episcopalians are more likely to be male whereas Baptists are more likely to be female.
Like, if you randomly assigned Baptist children to be converted to Presbyterianism, it seems unlikely that their school-liking will suddenly jump because they go somewhere else on Sunday, or that siblings will appear & vanish; it also seems unlikely that if they start liking school (maybe because of a nicer principal), that many of those children would spontaneously convert to Presbyterianism. Similarly, it seems rather unlikely that undergoing sexual-reassignment surgery will make Episcopalian men and Baptist women swap places, and it seems even more unlikely that their religious status caused their gender at conception. In all of these 5 cases, we are pretty sure that we can rule out one of the direct patterns, and that it was probably the third, and we could go through the rest of Meehl’s examples. (Indeed, this turns out to be a bad example because we can apply our knowledge that sex must have come many years before any other variable like “has cold hands” or “likes poetry” to rule out one pattern, but even so, we still don’t find any 50%s: it’s usually pretty obviously direct causation from the temporally earlier variable, or confounding, or both.)
So what I am doing in ‘How Often Does Correlation=Causality?’ is testing the claim that “yes, of course it would be absurd to take pairs of arbitrary variables and calculate their causal patterns for prior probabilities, because yeah, it would be low, maybe approaching 0 - but that’s irrelevant because that’s not what you or I are discussing when we discuss things like medicine. We’re discussing the good correlations, for interventions which have been filtered through the scientific process. All of the interventions we are discussing are clearly plausible and do not require time travel machines, usually have mechanisms proposed, have survived sophisticated statistical analysis which often controls for covariates or confounders, are regarded as credible by highly sophisticated credentialed experts like doctors or researchers with centuries of experience, and may even have had quasi-randomized or other kinds of experimental evidence; surely we can repose at least, say, 90% credibility, by the time that some drug or surgery or educational program has gotten that far and we’re reading about it in our favorite newspaper or blog? Being wrong 1 in 10 times would be painful, but it certainly doesn’t justify the sort of corrosive epistemological nihilism you seem to be espousing.”
But unfortunately, it seems that the error rate, after everything we humans can collectively do, is still a lot higher than 1 in 10 before the randomized version gets run. (Which implies that the scientific evidence is not very good in terms of providing enough Bayesian evidence to promote the hypothesis from <1% to >90%, or that it’s <<1% because causality is that rare.)
Thanks for these references! I’m a big fan, but for some reason your writing sits in the silly under-exploited part of my 2-by-2 box of “how much I enjoy reading this” and “how much of this do I actually read”, so I’d missed all of your posts on this topic! I caught up with some of it, and it’s far further along than my thinking. On a basic level, it matches my intuitive model of a sparse-ish network of causality which generates a much much denser network of correlation on top of it. I too would have guessed that the error rate on “good” studies would be lower!
Reflecting on the particular ways that perfectionism differs from the optimal policy (as someone who suffers from perfectionism) and looking to come up with simple definitions, I thought of this:
perfectionism looks to minimize the distance between an action and the ex-post optimal action but heavily dampening this penalty for the particular action “do nothing”
optimal policy says to pick the best ex-ante action out of the set of all possible actions, which set includes “do nothing”
So, perfectionism will be maximally costly in an environment where you have lots of valuable options of new things you could do (breaking from status quo) but you’re unsure whether you can come close to the best one, like you might end up choosing something that’s half as good as the best you could have done. Optimal policy would say to just give it your best, and that you should be happy since this is an amazingly good problem to have, whereas perfectionism will whisper in your ear how painful it might be to only get half of this very large chunk of potential utility, and wouldn’t it be easier if you just waited.
What’s the cost of keeping stuff stuff around vs discarding it and buying it back again?
When you have some infrequently-used items, you have to decide between keeping them around (default, typically) or discarding them and buying them again later when you need them.
If you keep them around, you clearly lose use of some of your space. Suppose you keep these in your house / apartment. The cost of keeping them around is then proportional to the amount of either surface area or volume they take up. Volume is the appropriate measure to use especially if you have dedicated storage space (like closets) and the items permit packing / stacking. Otherwise, surface area is a more appropriate measure, since having some item on a table kind of prevents you from using the space above that table. The motivation for assigning cost like this is simple: you could (in theory) give up the items that take up a certain size, live a house that is smaller by exactly that amound, and save on the rent differential.
The main levers are:
the only maybe non-obvious one is whether you think 2d or 3d the fair measure. 3d gives you a lot more space (since items are not cubic, and they typically take up space on one of their long sides, so they take up a higher fraction of surface area than volume). In my experince it’s hard to stack too many things while still retaining access to them so I weigh the 2d cost more.
cost (per sqft) of real estate in your area
how expensive the item is
how long before you expect to need the item again
There’s some nuance here like perhaps having an item laying around has higher cost than just the space it takes up because it contributes to an unpleasant sense of clutter. On the other hand, having the item “at the ready” is perhaps worth an immediacy premium on top of the alternative scenario of having to order and wait for it when the need arises. We are also ignoring that when you discard and rebuy, you end up with a brand new item, and potentially in some cases you can either gift or sell your old item, which yields some value to yourself and/or others. I think on net these nuances nudge in the direction of “discard and rebuy” vs what the math itself suggests.
I made a spreadsheet to do the math for some examples here, so far it seems like for some typical items I checked (such as a ball or balloon pump) you should sell and rebuy. For very expensive items that pack away easily (like a snowboard) you probably want to hang onto them.
It feels like (at least in the West) the majority of our ideation about the future is negative, e.g.
popular video games like Fallout
zombie apocalypse themed tv
shows like Black Mirror (there’s no equivalent White Mirror)
Are we at a historically negative point in the balance of “good vs bad ideation about the future” or is this type of collective pessimistic ideation normal?
If the balance towards pessimism is typical, is the promise of salvation in the afterlife in e.g. Christianity a rare example of a powerful and salient positive ideation about our futures (conditioned on some behavior)?
I agree. I feel like this is a very recent change as well. We used to be hopeful about the future, creating sci-fi about utopias rather than writing nightmare scenarios.
The west is becoming less self-affirming over time, and our mental health is generally getting worse. I think it’s because of historic guilt, as well as a kind of self-loathing pretending that it’s virtue (anti-borders, anti-nationalism, anti-natalism) not to mention the slander of psychological drives which strive for growth and quality (competition, hierarchies, ambition, elitism, discrimination/selection/gatekeeping)
I do not believe that the salvation in the afterlife is the opposite of this, but rather the same. It ultimately talks negatively about life and actual reality, comparing it to some unreachable ideal. It’s both pessimistic, as well as a psychological cope which makes it possible to endure this pessimism. The message is something akin to “Endure, and you will be rewarded in the end”
It’s a weariness we will have to overcome. I feel like our excessive tendency to problem-solving has caused us to view life as a big collection of problems, rather than something which is merely good but imperfect
Is meditation provably more effective than “forcing yourself to do nothing”?
Much like sleep is super important for good cognitive (and, of course, physical) functioning, it’s plausible that waking periods of not being stimulated (i.e. of boredom) are very useful for unlocking increased cognitive performance. Personally I’ve found that if I go a long time without allowing myself to be bored, e.g. by listening to podcasts or audiobooks whenever I’m in transition between activities, I’m less energetic, creative, sharp, etc.
The problem is that as a prescription “do nothing for 30 minutes” would be rejected as unappealing by most. So instead of “do nothing” it’s couched as “do this other thing” with a focus on breathing and so on. Does any of that stuff actually matter or does the benefit just come from doing nothing?
I think what those other things do is help you reach that state more easily and reliably. It’s like a ritual that you do before the actual task, to get yourself into the right frame of mind and form a better connection, similar to athletes having pre game rituals.
Also yeah, I think it makes the boredom easier to manage and helps you slowly get into it, rather than being pushed into it without reference.
Probably a lot of other hidden benefits though, because most meditation practices have been optimized for hundreds of years, and are better than others for a reason.
The parallel to athlete pre game rituals is an interesting one, but I guess I’d be interested in seeing the comparison between the following two groups:
group A: is told to meditate the usual way for 30 minutes / day, and does
group B: is told to just sit there for 30 minutes / day, and does
So both of the groups considered are sitting quietly for 30 minutes, but one group is meditating while the other is just sitting there. In this comparison, we’d be explicitly ignoring the benefit from meditation which acts via the channel of just making it more likely you actually sit there quietly for 30 minutes.
Simple math suggests that anybody who is selfish should be very supportive of acceleration towards ASI even for high values of p(doom).
Suppose somebody over the age of 50 thinks that p(doom) is on the order of 50%, and that they are totally selfish. It seems rational for them to support acceleration, since absent acceleration they are likely to die some time over the next 40ish years (since it’s improbable we’ll have life extension tech in time) but if we successfully accelerate to ASI, there’s a 1-p(doom) shot at an abundant and happy eternity.
Possibly some form of this extends beyond total selfishness.
Not for those who think AGI/TAI plausible within 2-5 years, and ASI 1-2 years after. Accelerating even further than whatever feasible caution can hopefully slow it down a bit and shape it more carefully would mostly increase doom, not personal survival. Also, there’s cryonics.
OK, agreed that this depends on your views of whether cryonics will work in your lifetime, and of “baseline” AGI/ASI timelines absent your finger on the scale. As you noted, it also depends on the delta between p(doom while accelerating) and baseline p(doom).
I’m guessing there’s a decent number of people who think current (and near future) cryonics don’t work, and that ASI is further away than 3-7 years (to use your range). Certainly the world mostly isn’t behaving as if it believed ASI was 3-7 years away, which might be a total failure of people acting on their beliefs, or it may just reflect that their beliefs are for further out numbers.
My model is that the current scaling experiment isn’t done yet but will be mostly done in a few years, and LLMs can plausibly surpass the data they are training on. Also, LLMs are digital and 100x faster than humans. Then once there are long-horizon task capable AIs that can do many jobs (the AGI/TAI milestone), even if the LLM scaling experiment failed and it took 10-15 years instead, we get another round of scaling and significant in-software improvement of AI within months that fixes all remaining crippling limitations, making them cognitively capable of all jobs (rather than only some jobs). At that point growth of industry goes off the charts, closer to biological anchors of say doubling in fruit fly biomass every 1.5 days than anything reasonable in any other context. This quickly gives the scale sufficient for ASI even if for some unfathomable reason it’s not possible to create with less scale.
Unclear what cryonics not yet working could mean, even highly destructive freezing is not a cryptographically secure method for erasing data, redundant clues about everything relevant will endure. A likely reason to expect cryonics not to work is not believing that ASI is possible, with actual capabilities of a superintelligence. This is similar to how economists project “reasonable” levels of post-TAI growth by not really accepting the premise of AIs actually capable of all jobs, including all new jobs their introduction into the economy creates. More practical issues are unreliability of arrangements that make cryopreservation happen for a given person and of subsequent storage all the way until ASI, through all the pre-ASI upheaval.
Since you marked as a crux the fragment “absent acceleration they are likely to die some time over the next 40ish years” I wanted to share two possibly relevant Metaculus questions. Both of these seem to suggest numbers longer than your estimates (and these are presumably inclusive of the potential impacts of AGI/TAI and ASI, so these don’t have the “absent acceleration” caveat).
I’m more certain about ASI being 1-2 years after TAI than about TAI in 2-5 years from now, as the latter could fail if the current training setups can’t make LLMs long-horizon capable at a scale that’s economically feasible absent TAI. But probably 20 years is sufficient to get TAI in any case, absent civilization-scale disruptions like an extremely deadly pandemic.
A model can update on discussion of its gears. Given predictions that don’t cite particular reasons, I can only weaken it as a whole, not improve it in detail (when I believe the predictions know better, without me knowing what specifically they know). So all I can do is mirror this concern by citing particular reasons that shape my own model.
Immorality has negative externalities which are diffuse, and hard to count, but quite possibly worse than its direct effects.
Take the example of Alice lying to Bob about something, to her benefit and his detriment. I will call the effects of the lie on Alice and Bob direct, and the effects on everybody else externalities. Concretely, the negative externalities here are that Bob is, on the margin, going to trust others in the future less for having been lied to by Alice than he would if Alice has been truthful. So in all of Bob’s future interactions, his truthful counterparties will have to work extra hard to prove that they are truthful, and maybe in some cases there are potentially beneficial deals that simply won’t occur due to Bob’s suspicions and his trying to avoid being betrayed.
This extra work that Bob’s future counterparties have to put in, as well as the lost value from missed deals, add up to a meaningful cost. This may extend beyond Bob, since everyone else who finds out that Bob was lied to by Alice will update their priors in the same direction as Bob, creating second order costs. What’s more, since everyone now thinks their counterparties suspect them of lying (marginally more), the reputational cost of doing so drops (because they already feel like they’re considered to be partially liars, so the cost of confirming that is less than if they felt they were seen as totally truthful) and as a result everyone might actually be more likely to lie.
So there’s a cost of deteriorating social trust, of p*ssing in the pool of social commons.
One consequence that seems to flow from this, and which I personally find morally counter-intuitive, and don’t actually believe, but cannot logically dismiss, is that if you’re going to lie you have a moral obligation to not get found out. This way, the damage of your lie is at least limited to its direct effects.
You’re right, this is not a morality-specific phenomenon. I think there’s a general formulation of this that just has to do with signaling, though I haven’t fully worked out the idea yet.
For example, if in a given interaction it’s important for your interlocutor to believe that you’re a human and not a bot, and you have something to lose if they are skeptical of your humanity, then there’s lots of negative externalities that come from the Internet being filled with indistinguishable-from-human chatbots, irrespective its morality.
I think “trust” is what you’re looking for, and signaling is one part of developing and nurturing that trust. It’s about the (mostly correct, or it doesn’t work) belief that you can expect certain behaviors and reactions, and strongly NOT expect others. If a large percentage of online interactions are with evil intent, it doesn’t matter too much whether they’re chatbots or human-trafficked exploitation farms—you can’t trust entities that you don’t know pretty well, and who don’t share your cultural and social norms and non-official judgement mechanisms.
One consequence that seems to flow from this, and which I personally find morally counter-intuitive, and don’t actually believe, but cannot logically dismiss, is that if you’re going to lie you have a moral obligation to not get found out. This way, the damage of your lie is at least limited to its direct effects.
With widespread information sharing, the ‘can’t foll all the people all the time’-logic extends to this attempt to lie without consequences: We’ll learn people ‘hide well but lie still so much’, so we’ll be even more suspicious in any situation, undoing the alleged externality-reducing effect of the ‘not get found out’ idea (in any realistic world with imperfect hiding, anyway).
What if a major contributor to the weakness of LLMs’ planning abilities is that the kind of step-by-step description of what a planning task looks like is content that isn’t widely available in common text training datasets? It’s mostly something we do silently, or we record in non-public places.
Maybe whoever gets the license to train on Jira data is going to get to crack this first.
Does belief quantization explain (some amount of) polarization?
Suppose people generally do Bayesian updating on beliefs. It seems plausible that most people (unless trained to do otherwise) subconsciosuly quantize their beliefs—let’s say, for the sake of argument, by rounding to the nearest 1%. In other words, if someone’s posterior on a statement is 75.2%, it will be rounded to 75%.
Consider questions that exhibit group-level polarization (e.g. on climate change, or the morality of abortion, or whatnot) and imagine that there is a series of “facts” that are floating around that someone uninformed doesn’t know about.
If one is exposed to facts in a randomly chosen order, then one will arrive at some reasonable posterior after all facts have been processed—in fact we can use this as a computational definition of the what it would be rational to conclude.
However, suppose that you are exposed to the facts that support the in-group position first (e.g. when coming of age in your own tribe) and the ones that contradict it later (e.g. when you leave the nest.) If your in-group is chronologically your first source of intel, this is plausible. In this case, if you update on sufficiently many supportive facts of the in-group stance, and you quantize, you’ll end up with a 100% belief on the in-group stance (or, conversely, a 0% belief on the out-group stance), after which point you will basically be unmoved by any contradictory facts you may later be exposed to (since you’re locked into full and unshakeable conviction by quantization).
One way to resist this is to refuse to ever be fully convinced of anything. However, this comes at a cost, since it’s cognitively expensive to hold onto very small numbers, and to intuitively update them well.
Regularization implements Occam’s Razor for machine learning systems.
When we have multiple hypotheses consistent with the same data (an overdetermined problem) Occam’s Razor says that the “simplest” one is more likely true.
When an overparameterized LLM is traversing the subspace of parameters that solve the training set seeking the smallest l2-norm say, it’s also effectively choosing the “simplest” solution from the solution set, where “simple” is defined as lower parameter norm i.e. more “concisely” expressed.
Unfortunately the entire complexity has just been pushed one level down into the definition of “simple”. The L2 norm can’t really be what we mean by simple, because simply scaling the weights in a layer by A, and the weights in the next layer by 1/A leaves the output of the network invariant, assuming ReLU activations, yet you can obtain arbitrarily high L2 norms by just choosing A high enough.
Agreed with your example, and I think that just means that L2 norm is not a pure implementation of what we mean by “simple”, in that it also induces some other preferences. In other words, it does other work too. Nevertheless, it would point us in the right direction frequently e.g. it will dislike networks whose parameters perform large offsetting operations, akin to mental frameworks or beliefs that require unecessarily and reducible artifice or intermediate steps.
Worth keeping in mind that “simple” is not clearly defined in the general case (forget about machine learning). I’m sure lots has been written about this idea, including here.
I wonder how much of the tremendously rapid progress of computer science in the last decade owes itself to structurally more rapid truth-finding, enabled by:
the virtual nature of the majority of the experiments, making them easily replicable
the proliferation of services like github, making it very easy to replicate others’ experiments
(a combination of the points above) the expectation that one would make one’s experiments easily available for replication by others
There are other reasons to expect rapid progress in CS (compared to, say, electrical engineering) but I wonder how much is explained by this replication dynamic.
Very little, because most CS experiments are not in fact replicable (and that’s usually only one of several serious methodological problems).
CS does seem somewhat ahead of other fields I’ve worked in, but I’d attribute that to the mostly-separate open source community rather than academia per se.
To be sure, let’s say we’re talking about something like “the entirety of published material” rather than the subset of it that comes from academia. This is meant to very much include the open source community.
Very curious, in what way are most CS experiments not replicable? From what I’ve seen in deep learning, for instance, it’s standard practice to include a working github repo along with the paper (I’m sure you know lots more about this than I do). This is not the case in economics, for instance, just to pick a field I’m familiar with.
Fuzzing is a generally pretty healthy subfield, but even there most peer-reviewed papers in top venues are still are completely useless! Importantly, “a ‘working’ github repo” is really not enough to ensure that your results are reproducible, let alone ensure external validity.
From personal observation, kids learn text (say, from a children’s book, and from songs) back-to-front. That is, the adult will say all but the last word in the sentence, and the kid will (eventually) learn to chime in to complete the sentence.
This feels correlated to LLMs learning well when tasked with next-token prediction, and those predictions being stronger (less uniform over the vocabulary) when the preceding sequences get longer.
I wonder if there’s a connection to having rhyme “live” in the last sound of each line, as opposed to the first.
A lot of memory seems to be linear, possibly because most information in the world is encoded linearly. If I was to tell you the 20th letter of the alphabet, I’d have to go through every letter it in my head. It’s a linked-list data structure.
Even many memory techniques, like the mind palace, is ordered, with each item linking to the next.
I don’t think this is the same as markov-chains or predicting the next item, but that it has to do with the most common data structure of information being linear.
As for making the first word rhyme instead of the last, that’s an interesting thought! I actually have no idea. When I rhyme like that in my head, it sounds wrong, but I couldn’t tell you the reason. You may be on to something.
Pretending not to see when a rule you’ve set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes).
Example: suppose you have a toddler and a “rule” that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There’s cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more painful than it’s worth in that moment (meaning, fully discounting future consequences). If you fail to enforce the rule, you undermine your authority which results in your toddler fighting future enforcement (of this and possibly all other rules!) much harder, as he realizes that the rule is in fact negotiable / flexible.
However, you have a third choice, which is to credibly pretend to not see that he’s doing it. It’s true that this will undermine your perceived competence, as an authority, somewhat. However, it does not undermine the perception that the rule is to be fully enforced if only you noticed the violation. You get to “skip” a particularly costly enforcement, without taking steps back that compromise future enforcement much.
I bet this happens sometimes in classrooms (re: disruptive students) and prisons (re: troublesome prisoners) and regulation (re: companies that operate in legally aggressive ways).
Of course, this stops working and becomes a farce once the pretense is clearly visible. Once your toddler knows that sometimes you pretend not to see things to avoid a fight, the benefit totally goes away. So it must be used judiciously and artfully.
Huh, that went somewhere other than where I was expecting. I thought you were going to say that ignoring letter-of-the-rule violations is fine when they’re not spirit-of-the-rule violations, as a way of communicating the actual boundaries.
Perhaps that can work depending on the circumstances. In the specific case of a toddler, at the risk of not giving him enough credit, I think that type of distinction is too nuanced. I suspect that in practice this will simply make him litigate every particular application of any given rule (since it gives him hope that it might work) which raises the cost of enforcement dramatically. Potentially it might also make him more stressed, as I think there’s something very mentally soothing / non-taxing about bright line rules.
I think with older kids though, it’s obviously a really important learning to understand that the letter of the law and the spirit of the law do not always coincide. There’s a bit of a blackpill that comes with that though, once you understand that people can get away with violating the spirit as long as they comply with the letter, or that complying with the spirit (which you can grok more easily) does not always guarantee compliance with the letter, which puts you at risk of getting in trouble.
Teacher here, can confirm.
(sci-fi take?) If time travel and time loops are possible, would this not be the (general sketch of the) scenario under which it comes into existence:
1. a lab figures out some candidate particles that could be sent back in time, build a detector for them and start scanning for them. suppose the particle has some binary state. if the particle is +1 (-1) the lab buys (shorts) stock futures and exits after 5 minutes
2. the trading strategy will turn out to be very accurate and the profits from the trading strategy will be utilized to fund the research required to build the time machine
3. at some arbitrary point in the future, eventually, the r&d and engineering efforts are successful. once the device is built, the lab starts sending information back in time to tip itself to future moves in stock futures (the very same particles it originally received). this closes the time loop and guarantees temporal consistency
Reasons why this might not happen:
time doesn’t work like this, or time travel / loops aren’t possible
civilization doesn’t survive long enough to build the device
the lab can’t commit to using its newfound riches to build the device, breaking the logic and preventing the whole thing from working in the first place
“DO NOT MESS WITH TIME”
4. The lab sees milions of conflicting particles. Their plan was foiled by future spam.
5. The lab is killed by assasins, who’ve been waiting for the signal since the 12th century and get activated by the Time Control Authority.
Infertility rates are rising and nobody seems to quite know why. Below is what feels like a possible (trivial) explanation that I haven’t seen mentioned anywhere.
I’m not in this field personally so it’s possible this theory is out there, but asking GPT about it doesn’t yield the proposed explanation: https://chat.openai.com/share/ab4138f6-978c-445a-9228-674ffa5584ea
Toy model:
a family is either fertile or infertile, and fertility is hereditary
the modal fertile family can have up to 10 kids, the modal infertile family can only have 2 kids
in the olden days families aimed to have as many kids as they could
now families aim to have 2 kids each
Under this model, in the olden days we would find a high proportion of fertile people in the gene pool, but in the modern world we wouldn’t. Put differently, the old convention lead to a strong positive correlation between fertility and participation in the gene pool, and the new convention leads to 0 correlation. This removes the selective pressure on fertility, hence we should expect fertility to drop / infertility to rise.
Empirical evidence for this would be something like an analysis of the time series of family size variance and infertility—is lower variance followed by increased infertility?
How robust is the information that infertility rates are rising?
To be sure, I’m not an expert on the topic.
Declines in male fertility I think are regarded as real, though I haven’t examined the primary sources.
Regarding female fertility, this report from Norway outlines the trend that I vaguely thought was representative of most of the developed world over the last 100 years.
Female fertility is trickier to measure, since female fertility and age are strongly correlated, and women have been having kids later, so it’s important (and likely tricky) to disentangle this confounder from the data.
I often mistakenly behave as if my payoff structure is binary instead of gradual. I think others do too, and this cuts across various areas.
For instance, I might wrap up my day and notice that it’s already 11:30pm, though I’d planned to go to sleep an hour earlier, by 10:30pm. My choice is, do I do a couple of me-things like watch that interesting YouTube video I’d marked as “watch later”, or do I just go to sleep ASAP? I often do the former and then predictably regret it the next day when I’m too tired to function well. I’ve reflected on what’s going on in my mind (with the ultimate goal of changing my behavior) and I think the simplest explanation is that I behave as if the payoff curve, in this case of length of sleep, is binary rather than gradual. Rational decision-making would prescribe that, especially once you’re getting less rest than you need, every additional hour of sleep is worth more rather than less. However, I suspect my instinctive thought process is something like “well, I’ve already missed my sleep target even if I go to sleep ASAP, so might as well watch a couple of videos and enjoy myself a little since my day tomorrow is already shot.”
This is pretty terrible! It’s the opposite of what I should be doing!
Maybe something like this is going on when poor people spend a substantial fraction of their income on the lottery (I’m already poor and losing an extra $20 won’t change that, but if I win I’ll stop being poor, so let me try) or when people who are out of shape choose not to exercise (I’m already pretty unhealthy and one 30-minute workout won’t change that, so why waste my time.) or when people who have a setback in their professional career have trouble picking themselves back up (my story is not going to be picture perfect anyway, so why bother.)
It would be good to have some kind of mental reframing to help me avoid this prectictably regrettable behavior.
I think you’re probably quite correct about that example, and similar things. I notice other people doing this a lot, and I catch myself at it sometimes. So I think noticing and eliminating this particular flaw in logic is helpful.
I also think the underlying problem goes deeper. Because we want to stay up and watch that video, our brain will come up with excuses to do it, and we’ll be biased to just quickly accept those excuses when we otherwise would recognize them as logically flawed, because we want to.
This is motivated reasoning. I think it’s the single most impactful and pervasive bias. I spent some years studying this, but I haven’t yet gotten around to writing about it on LW because it’s not directly alignment-relevant. I really need to do at least a short post, because it is relevant for basically navigating and understanding all psychology. Including the field of alignment research.
This raises the question of what it means to want to do something, and who exactly (or which cognitive system) is doing the wanting.
Of course I do want to keep watching YT, but I also recognize there’s a cost to it. So on some level, weighing the pros and cons, I (or at least an earlier version of me) sincerely do want to go to bed by 10:30pm. But, in the moment, the tradeoffs look different from how they appeared from further away, and I make (or, default into) a different decision.
An interesting hypothetical here is whether I’d stay up longer when play time starts at 11:30pm than when play time starts at, say, 10:15pm (if bedtime is 10:30pm). The wanting to play, and the temptation to ignore the cost, might be similar in both scenarios. But this sunk cost / binary outcome fallacy would suggest that I’ll (marginally) blow further past my deadline in the former situation than in the latter.
I recognize a very similar failure mode of instrumental rationality: I sometimes include in the decision process for an action not just the utility of that action itself, but also its probability. That is, I act on the expected utility of the action, not on its utility. Example:
I should hurry up enough to catch my train (hurrying up enough has high utility)
Based on experience, I probably won’t hurry up enough (hurrying up enough has low probability)
So the expected utility (utility*probability) of hurrying up enough is not very high
So I don’t hurry up enough
So I miss my train.
The mistake is to pay any attention to the expected utility (utility*probability) of an action, rather than just to its utility. The probability of what I will do is irrelevant to what I should do. The probability of an action should be the output, never the input of my decision. If one action has the highest utility, it should go to 100% probability (that is, I should do it) and all the alternative actions should go to 0 probability.
The scary thing is that recognizing this mistake doesn’t help with avoiding it.
Proposal: if you’re a social media or other content based platform, add a long-press to the “share” button which allows you to choose between “hate share” and “love share”.
Therefore:
* quick tap: keep the current functionality, you get to send the link wherever / copy to clipboard
* long press and swipe to either hate or love share: you still get to send the link (optionally, the URL has some argument indicating it’s a hate / love share, if the link is a redirect through the social media platform)
This would allow users to separate out between things that are worth sharing but that they hate / love and want to see less / more of, and it might defang the currently powerful strategy (with massive negative social externalities) of generating outrage content just to get more shares.
Social media companies can, in turn, then use this to dial back the viraility of hate share vs love share content, if they choose to do so.
I believe that there is already far too much “hate sharing”.
Perhaps the default in a social media UI should be that shared content includes a public endorsement of whatever content it links to, and if you want to “hate share” anything without such an endorsement, you have to fight a hostile UI to do so.
In particular, “things that are worth sharing” absolutely should not overlap with “want to see less of”. If you want to see less of some type of thing, it’s self-defeating to distribute more copies of it. Worse, if you even suspect that any of your own readers are anything like you, why are you inflicting it on them?
One way to “see less of” something you hate is to stop it from being produced, and that may be seen as a better solution than basically averting your eyes from it. Rallying mobs to get whoever produced it fired has proven to be quite effective.
The more complex the encoding of a system (e.g. of ethics) is, the more likely it is that it’s reverse-engineered in some way. Complexity is a marker of someone working backwards to encapsulate messy object-level judgment into principles. Conversely, a system that flows outward from principles to objects will be neatly packed in its meta-level form.
In linear algebra terms, as long as the space of principles has fewer dimensions than the space of objects, we expect principled systems / rules to have a low-rank representation, with a dimensionality approaching that of the space of principles and far below that of the space of objects.
As a corrolary, perhaps we are justified in being more suspicious of complex systems over simple ones, since they come with a higher risk that the systems are “insincere”, in the sense that they were deliberately created with the purpose of justifying a particular outcome rather than being genuine and principled.
This rhymes with Occam’s razor, and also with some AI safety approaches which planned to explore whether dishonesty is more computationally costly than honesty.
Does this mean that meta-level systems are memetically superior, since their informational payloads are smaller? The success of Abrahamic religions (which mostly compress neatly into 10-12 commandments) might agree with this.
I don’t know of any encodings or legibile descriptions of ethics that AREN’T reverse-engineered. Unless you’re a moral realist, I suspect this has to be the case, because such systems are in the map, not the territory. And not even in the most detailed maps, they’re massively abstracted over other abstractions.
I’m far more suspicious of simple descriptions, especially when the object space has many more dimensions. The likelihood that they’ve missed important things about the actual behavior/observations is extremely high.
Agreed that ultimately everything is reverse-engineered, because we don’t live in a vacuum. However, I feel like there’s a meaningful distinction between:
1. let me reverse engineer the principles that best describe our moral intuition, and let me allow parsimonious principles to make me think twice about the moral contradictions that our actual behavior often implies, and perhaps even allow my behavior to change as a result
2. let me concoct a set of rules and exceptions that will justify the particular outcome I want, which is often the one that best suits me
For example, consider the contrast between “we should always strive to treat others fairly” and “we should treat others fairly when they are more powerful than us, however if they are weaker let us then do to them whatever is in our best interest whether or not it is unfair, while at the same time paying lip service to fairness in hopes that we cajole those more powerful than us into treating us fairly”. I find the former a less corrupted piece of moral logic than the latter even though the latter arguably describes actual behavior fairly well. The former compresses more neatly, which isn’t a coincidence.
There’s something of a [bias-variance tradeoff](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff) here. The smaller the moral model, the less expressive it can be (so the more nuance it misses), but the more helpful it will be on future, out-of-distribution questions.
My point was not that we don’t live in a vacuum, but that there’s no ground truth or “correct” model. We’re ONLY extrapolating from very limited experienced examples, not understanding anything fundamental.
When you see the word “should”, you know you’re in preferences and modeling land, right?
Causality is rare! The usual statement that “correlation does not imply causation” puts them, I think, on deceptively equal footing. It’s really more like correlation is almost always not causation absent something strong like an RCT or a robust study set-up.
Over the past few years I’d gradually become increasingly skeptical of claims of causality just by updating on empirical observations, but it just struck me that there’s a good first principles reason for this.
For each true cause of some outcome we care to influence, there are many other “measurables” that correlate to the true cause but, by default, have no impact on our outcome of interest. Many of these measures will (weakly) correlate to the outcome though, via their correlation to the true cause. So there’s a one-to-many relationship between the true cause and the non-causal correlates. Therefore, if all you know is that something correlates with a particular outcome, you should have a strong prior against that correlation being causal.
My thinking previously was along the lines of p-hacking: if there are many things you can test, some of them will cross a given significance threshold by chance alone. But I’m claiming something more specific than that: any true cause is bound to be correlated to a bunch of stuff, which will therefore probably correlate with our outcome of interest (though more weakly, and not guaranteed since correlation is not necessarily transitive).
The obvious idea of requiring a plausible hypothesis for the causation helps somewhat here, since it rules out some of the non-causal correlates. But it may still leave many of them untouched, especially the more creative our hypothesis formation process is! Another (sensible and obvious, that maybe doesn’t even require agreement with the above) heuristic is to distrust small (magnitude) effects, since the true cause is likely to be more strongly correlated with the outcome of interest than any particular correlate of the true cause.
This seems pretty different from Gwern’s paper selection trying to answer this topic in How Often Does Correlation=Causality?, where he concludes
Also see his Why Correlation Usually ≠ Causation.
Those are not randomly selected pairs, however. There are 3 major causal patterns: A->B, A<-B, and A<-C->B. Daecaneus is pointing out that for a random pair of correlations of some variables, we do not assign a uniform prior of 33% to each of these. While it may sound crazy to try to argue for some specific prior like ‘we should assign 1% to the direct causal patterns of A->B and A<-B, and 99% to the confounding pattern of A<-C->B’, this is a lot closer to the truth than thinking that ‘a third of the time, A causes B; a third of the time, B causes A; and the other third of the time, it’s just some confounder’.
What would be relevant there is “Everything is Correlated”. If you look at, say, Meehl’s examples of correlations from very large datasets, and ask about causality, I think it becomes clearer. Let’s take one of his first examples:
Like, if you randomly assigned Baptist children to be converted to Presbyterianism, it seems unlikely that their school-liking will suddenly jump because they go somewhere else on Sunday, or that siblings will appear & vanish; it also seems unlikely that if they start liking school (maybe because of a nicer principal), that many of those children would spontaneously convert to Presbyterianism. Similarly, it seems rather unlikely that undergoing sexual-reassignment surgery will make Episcopalian men and Baptist women swap places, and it seems even more unlikely that their religious status caused their gender at conception. In all of these 5 cases, we are pretty sure that we can rule out one of the direct patterns, and that it was probably the third, and we could go through the rest of Meehl’s examples. (Indeed, this turns out to be a bad example because we can apply our knowledge that sex must have come many years before any other variable like “has cold hands” or “likes poetry” to rule out one pattern, but even so, we still don’t find any 50%s: it’s usually pretty obviously direct causation from the temporally earlier variable, or confounding, or both.)
So what I am doing in ‘How Often Does Correlation=Causality?’ is testing the claim that “yes, of course it would be absurd to take pairs of arbitrary variables and calculate their causal patterns for prior probabilities, because yeah, it would be low, maybe approaching 0 - but that’s irrelevant because that’s not what you or I are discussing when we discuss things like medicine. We’re discussing the good correlations, for interventions which have been filtered through the scientific process. All of the interventions we are discussing are clearly plausible and do not require time travel machines, usually have mechanisms proposed, have survived sophisticated statistical analysis which often controls for covariates or confounders, are regarded as credible by highly sophisticated credentialed experts like doctors or researchers with centuries of experience, and may even have had quasi-randomized or other kinds of experimental evidence; surely we can repose at least, say, 90% credibility, by the time that some drug or surgery or educational program has gotten that far and we’re reading about it in our favorite newspaper or blog? Being wrong 1 in 10 times would be painful, but it certainly doesn’t justify the sort of corrosive epistemological nihilism you seem to be espousing.”
But unfortunately, it seems that the error rate, after everything we humans can collectively do, is still a lot higher than 1 in 10 before the randomized version gets run. (Which implies that the scientific evidence is not very good in terms of providing enough Bayesian evidence to promote the hypothesis from <1% to >90%, or that it’s <<1% because causality is that rare.)
Thanks for these references! I’m a big fan, but for some reason your writing sits in the silly under-exploited part of my 2-by-2 box of “how much I enjoy reading this” and “how much of this do I actually read”, so I’d missed all of your posts on this topic! I caught up with some of it, and it’s far further along than my thinking. On a basic level, it matches my intuitive model of a sparse-ish network of causality which generates a much much denser network of correlation on top of it. I too would have guessed that the error rate on “good” studies would be lower!
Reflecting on the particular ways that perfectionism differs from the optimal policy (as someone who suffers from perfectionism) and looking to come up with simple definitions, I thought of this:
perfectionism looks to minimize the distance between an action and the ex-post optimal action but heavily dampening this penalty for the particular action “do nothing”
optimal policy says to pick the best ex-ante action out of the set of all possible actions, which set includes “do nothing”
So, perfectionism will be maximally costly in an environment where you have lots of valuable options of new things you could do (breaking from status quo) but you’re unsure whether you can come close to the best one, like you might end up choosing something that’s half as good as the best you could have done. Optimal policy would say to just give it your best, and that you should be happy since this is an amazingly good problem to have, whereas perfectionism will whisper in your ear how painful it might be to only get half of this very large chunk of potential utility, and wouldn’t it be easier if you just waited.
What’s the cost of keeping stuff stuff around vs discarding it and buying it back again?
When you have some infrequently-used items, you have to decide between keeping them around (default, typically) or discarding them and buying them again later when you need them.
If you keep them around, you clearly lose use of some of your space. Suppose you keep these in your house / apartment. The cost of keeping them around is then proportional to the amount of either surface area or volume they take up. Volume is the appropriate measure to use especially if you have dedicated storage space (like closets) and the items permit packing / stacking. Otherwise, surface area is a more appropriate measure, since having some item on a table kind of prevents you from using the space above that table. The motivation for assigning cost like this is simple: you could (in theory) give up the items that take up a certain size, live a house that is smaller by exactly that amound, and save on the rent differential.
The main levers are:
the only maybe non-obvious one is whether you think 2d or 3d the fair measure. 3d gives you a lot more space (since items are not cubic, and they typically take up space on one of their long sides, so they take up a higher fraction of surface area than volume). In my experince it’s hard to stack too many things while still retaining access to them so I weigh the 2d cost more.
cost (per sqft) of real estate in your area
how expensive the item is
how long before you expect to need the item again
There’s some nuance here like perhaps having an item laying around has higher cost than just the space it takes up because it contributes to an unpleasant sense of clutter. On the other hand, having the item “at the ready” is perhaps worth an immediacy premium on top of the alternative scenario of having to order and wait for it when the need arises. We are also ignoring that when you discard and rebuy, you end up with a brand new item, and potentially in some cases you can either gift or sell your old item, which yields some value to yourself and/or others. I think on net these nuances nudge in the direction of “discard and rebuy” vs what the math itself suggests.
I made a spreadsheet to do the math for some examples here, so far it seems like for some typical items I checked (such as a ball or balloon pump) you should sell and rebuy. For very expensive items that pack away easily (like a snowboard) you probably want to hang onto them.
The spreadsheet is here, feel free to edit it (I saved a copy) https://docs.google.com/spreadsheets/d/1oz7FcAKIlbCJJaBo8XAmr3BqSYd_uoNTlgCCSV4y4j0/edit?usp=sharing
It feels like (at least in the West) the majority of our ideation about the future is negative, e.g.
popular video games like Fallout
zombie apocalypse themed tv
shows like Black Mirror (there’s no equivalent White Mirror)
Are we at a historically negative point in the balance of “good vs bad ideation about the future” or is this type of collective pessimistic ideation normal?
If the balance towards pessimism is typical, is the promise of salvation in the afterlife in e.g. Christianity a rare example of a powerful and salient positive ideation about our futures (conditioned on some behavior)?
I agree. I feel like this is a very recent change as well. We used to be hopeful about the future, creating sci-fi about utopias rather than writing nightmare scenarios.
The west is becoming less self-affirming over time, and our mental health is generally getting worse. I think it’s because of historic guilt, as well as a kind of self-loathing pretending that it’s virtue (anti-borders, anti-nationalism, anti-natalism) not to mention the slander of psychological drives which strive for growth and quality (competition, hierarchies, ambition, elitism, discrimination/selection/gatekeeping)
I do not believe that the salvation in the afterlife is the opposite of this, but rather the same. It ultimately talks negatively about life and actual reality, comparing it to some unreachable ideal. It’s both pessimistic, as well as a psychological cope which makes it possible to endure this pessimism. The message is something akin to “Endure, and you will be rewarded in the end”
It’s a weariness we will have to overcome. I feel like our excessive tendency to problem-solving has caused us to view life as a big collection of problems, rather than something which is merely good but imperfect
Is meditation provably more effective than “forcing yourself to do nothing”?
Much like sleep is super important for good cognitive (and, of course, physical) functioning, it’s plausible that waking periods of not being stimulated (i.e. of boredom) are very useful for unlocking increased cognitive performance. Personally I’ve found that if I go a long time without allowing myself to be bored, e.g. by listening to podcasts or audiobooks whenever I’m in transition between activities, I’m less energetic, creative, sharp, etc.
The problem is that as a prescription “do nothing for 30 minutes” would be rejected as unappealing by most. So instead of “do nothing” it’s couched as “do this other thing” with a focus on breathing and so on. Does any of that stuff actually matter or does the benefit just come from doing nothing?
There are some styles of meditation that are explicitly described as “just sitting” or “doing nothing.”
Kind of related Quanta article from a few days ago: https://www.quantamagazine.org/what-your-brain-is-doing-when-youre-not-doing-anything-20240205/
I think what those other things do is help you reach that state more easily and reliably. It’s like a ritual that you do before the actual task, to get yourself into the right frame of mind and form a better connection, similar to athletes having pre game rituals.
Also yeah, I think it makes the boredom easier to manage and helps you slowly get into it, rather than being pushed into it without reference.
Probably a lot of other hidden benefits though, because most meditation practices have been optimized for hundreds of years, and are better than others for a reason.
The parallel to athlete pre game rituals is an interesting one, but I guess I’d be interested in seeing the comparison between the following two groups:
group A: is told to meditate the usual way for 30 minutes / day, and does
group B: is told to just sit there for 30 minutes / day, and does
So both of the groups considered are sitting quietly for 30 minutes, but one group is meditating while the other is just sitting there. In this comparison, we’d be explicitly ignoring the benefit from meditation which acts via the channel of just making it more likely you actually sit there quietly for 30 minutes.
Simple math suggests that anybody who is selfish should be very supportive of acceleration towards ASI even for high values of p(doom).
Suppose somebody over the age of 50 thinks that p(doom) is on the order of 50%, and that they are totally selfish. It seems rational for them to support acceleration, since absent acceleration they are likely to die some time over the next 40ish years (since it’s improbable we’ll have life extension tech in time) but if we successfully accelerate to ASI, there’s a 1-p(doom) shot at an abundant and happy eternity.
Possibly some form of this extends beyond total selfishness.
Not for those who think AGI/TAI plausible within 2-5 years, and ASI 1-2 years after. Accelerating even further than whatever feasible caution can hopefully slow it down a bit and shape it more carefully would mostly increase doom, not personal survival. Also, there’s cryonics.
OK, agreed that this depends on your views of whether cryonics will work in your lifetime, and of “baseline” AGI/ASI timelines absent your finger on the scale. As you noted, it also depends on the delta between p(doom while accelerating) and baseline p(doom).
I’m guessing there’s a decent number of people who think current (and near future) cryonics don’t work, and that ASI is further away than 3-7 years (to use your range). Certainly the world mostly isn’t behaving as if it believed ASI was 3-7 years away, which might be a total failure of people acting on their beliefs, or it may just reflect that their beliefs are for further out numbers.
My model is that the current scaling experiment isn’t done yet but will be mostly done in a few years, and LLMs can plausibly surpass the data they are training on. Also, LLMs are digital and 100x faster than humans. Then once there are long-horizon task capable AIs that can do many jobs (the AGI/TAI milestone), even if the LLM scaling experiment failed and it took 10-15 years instead, we get another round of scaling and significant in-software improvement of AI within months that fixes all remaining crippling limitations, making them cognitively capable of all jobs (rather than only some jobs). At that point growth of industry goes off the charts, closer to biological anchors of say doubling in fruit fly biomass every 1.5 days than anything reasonable in any other context. This quickly gives the scale sufficient for ASI even if for some unfathomable reason it’s not possible to create with less scale.
Unclear what cryonics not yet working could mean, even highly destructive freezing is not a cryptographically secure method for erasing data, redundant clues about everything relevant will endure. A likely reason to expect cryonics not to work is not believing that ASI is possible, with actual capabilities of a superintelligence. This is similar to how economists project “reasonable” levels of post-TAI growth by not really accepting the premise of AIs actually capable of all jobs, including all new jobs their introduction into the economy creates. More practical issues are unreliability of arrangements that make cryopreservation happen for a given person and of subsequent storage all the way until ASI, through all the pre-ASI upheaval.
Since you marked as a crux the fragment “absent acceleration they are likely to die some time over the next 40ish years” I wanted to share two possibly relevant Metaculus questions. Both of these seem to suggest numbers longer than your estimates (and these are presumably inclusive of the potential impacts of AGI/TAI and ASI, so these don’t have the “absent acceleration” caveat).
I’m more certain about ASI being 1-2 years after TAI than about TAI in 2-5 years from now, as the latter could fail if the current training setups can’t make LLMs long-horizon capable at a scale that’s economically feasible absent TAI. But probably 20 years is sufficient to get TAI in any case, absent civilization-scale disruptions like an extremely deadly pandemic.
A model can update on discussion of its gears. Given predictions that don’t cite particular reasons, I can only weaken it as a whole, not improve it in detail (when I believe the predictions know better, without me knowing what specifically they know). So all I can do is mirror this concern by citing particular reasons that shape my own model.
As a 50 year old, you don’t need to support acceleration, you’ll be well alive when ASI gets here.
Simple math suggests you could just enjoy your 50′s and roll the dice when you have less to lose.
Immorality has negative externalities which are diffuse, and hard to count, but quite possibly worse than its direct effects.
Take the example of Alice lying to Bob about something, to her benefit and his detriment. I will call the effects of the lie on Alice and Bob direct, and the effects on everybody else externalities. Concretely, the negative externalities here are that Bob is, on the margin, going to trust others in the future less for having been lied to by Alice than he would if Alice has been truthful. So in all of Bob’s future interactions, his truthful counterparties will have to work extra hard to prove that they are truthful, and maybe in some cases there are potentially beneficial deals that simply won’t occur due to Bob’s suspicions and his trying to avoid being betrayed.
This extra work that Bob’s future counterparties have to put in, as well as the lost value from missed deals, add up to a meaningful cost. This may extend beyond Bob, since everyone else who finds out that Bob was lied to by Alice will update their priors in the same direction as Bob, creating second order costs. What’s more, since everyone now thinks their counterparties suspect them of lying (marginally more), the reputational cost of doing so drops (because they already feel like they’re considered to be partially liars, so the cost of confirming that is less than if they felt they were seen as totally truthful) and as a result everyone might actually be more likely to lie.
So there’s a cost of deteriorating social trust, of p*ssing in the pool of social commons.
One consequence that seems to flow from this, and which I personally find morally counter-intuitive, and don’t actually believe, but cannot logically dismiss, is that if you’re going to lie you have a moral obligation to not get found out. This way, the damage of your lie is at least limited to its direct effects.
Fully agree, but I’d avoid the term “immorality”. Deviation from social norms has this cost, whether those norms are reasonable or not.
You’re right, this is not a morality-specific phenomenon. I think there’s a general formulation of this that just has to do with signaling, though I haven’t fully worked out the idea yet.
For example, if in a given interaction it’s important for your interlocutor to believe that you’re a human and not a bot, and you have something to lose if they are skeptical of your humanity, then there’s lots of negative externalities that come from the Internet being filled with indistinguishable-from-human chatbots, irrespective its morality.
I think “trust” is what you’re looking for, and signaling is one part of developing and nurturing that trust. It’s about the (mostly correct, or it doesn’t work) belief that you can expect certain behaviors and reactions, and strongly NOT expect others. If a large percentage of online interactions are with evil intent, it doesn’t matter too much whether they’re chatbots or human-trafficked exploitation farms—you can’t trust entities that you don’t know pretty well, and who don’t share your cultural and social norms and non-official judgement mechanisms.
With widespread information sharing, the ‘can’t foll all the people all the time’-logic extends to this attempt to lie without consequences: We’ll learn people ‘hide well but lie still so much’, so we’ll be even more suspicious in any situation, undoing the alleged externality-reducing effect of the ‘not get found out’ idea (in any realistic world with imperfect hiding, anyway).
What if a major contributor to the weakness of LLMs’ planning abilities is that the kind of step-by-step description of what a planning task looks like is content that isn’t widely available in common text training datasets? It’s mostly something we do silently, or we record in non-public places.
Maybe whoever gets the license to train on Jira data is going to get to crack this first.
Does belief quantization explain (some amount of) polarization?
Suppose people generally do Bayesian updating on beliefs. It seems plausible that most people (unless trained to do otherwise) subconsciosuly quantize their beliefs—let’s say, for the sake of argument, by rounding to the nearest 1%. In other words, if someone’s posterior on a statement is 75.2%, it will be rounded to 75%.
Consider questions that exhibit group-level polarization (e.g. on climate change, or the morality of abortion, or whatnot) and imagine that there is a series of “facts” that are floating around that someone uninformed doesn’t know about.
If one is exposed to facts in a randomly chosen order, then one will arrive at some reasonable posterior after all facts have been processed—in fact we can use this as a computational definition of the what it would be rational to conclude.
However, suppose that you are exposed to the facts that support the in-group position first (e.g. when coming of age in your own tribe) and the ones that contradict it later (e.g. when you leave the nest.) If your in-group is chronologically your first source of intel, this is plausible. In this case, if you update on sufficiently many supportive facts of the in-group stance, and you quantize, you’ll end up with a 100% belief on the in-group stance (or, conversely, a 0% belief on the out-group stance), after which point you will basically be unmoved by any contradictory facts you may later be exposed to (since you’re locked into full and unshakeable conviction by quantization).
One way to resist this is to refuse to ever be fully convinced of anything. However, this comes at a cost, since it’s cognitively expensive to hold onto very small numbers, and to intuitively update them well.
See also different practitioners in the same field with very different methodologies they are sure is the Best Way To Do Things
Regularization implements Occam’s Razor for machine learning systems.
When we have multiple hypotheses consistent with the same data (an overdetermined problem) Occam’s Razor says that the “simplest” one is more likely true.
When an overparameterized LLM is traversing the subspace of parameters that solve the training set seeking the smallest l2-norm say, it’s also effectively choosing the “simplest” solution from the solution set, where “simple” is defined as lower parameter norm i.e. more “concisely” expressed.
Unfortunately the entire complexity has just been pushed one level down into the definition of “simple”. The L2 norm can’t really be what we mean by simple, because simply scaling the weights in a layer by A, and the weights in the next layer by 1/A leaves the output of the network invariant, assuming ReLU activations, yet you can obtain arbitrarily high L2 norms by just choosing A high enough.
Agreed with your example, and I think that just means that L2 norm is not a pure implementation of what we mean by “simple”, in that it also induces some other preferences. In other words, it does other work too. Nevertheless, it would point us in the right direction frequently e.g. it will dislike networks whose parameters perform large offsetting operations, akin to mental frameworks or beliefs that require unecessarily and reducible artifice or intermediate steps.
Worth keeping in mind that “simple” is not clearly defined in the general case (forget about machine learning). I’m sure lots has been written about this idea, including here.
I wonder how much of the tremendously rapid progress of computer science in the last decade owes itself to structurally more rapid truth-finding, enabled by:
the virtual nature of the majority of the experiments, making them easily replicable
the proliferation of services like github, making it very easy to replicate others’ experiments
(a combination of the points above) the expectation that one would make one’s experiments easily available for replication by others
There are other reasons to expect rapid progress in CS (compared to, say, electrical engineering) but I wonder how much is explained by this replication dynamic.
Very little, because most CS experiments are not in fact replicable (and that’s usually only one of several serious methodological problems).
CS does seem somewhat ahead of other fields I’ve worked in, but I’d attribute that to the mostly-separate open source community rather than academia per se.
To be sure, let’s say we’re talking about something like “the entirety of published material” rather than the subset of it that comes from academia. This is meant to very much include the open source community.
Very curious, in what way are most CS experiments not replicable? From what I’ve seen in deep learning, for instance, it’s standard practice to include a working github repo along with the paper (I’m sure you know lots more about this than I do). This is not the case in economics, for instance, just to pick a field I’m familiar with.
See e.g. https://mschloegel.me/paper/schloegel2024sokfuzzevals.pdf
Fuzzing is a generally pretty healthy subfield, but even there most peer-reviewed papers in top venues are still are completely useless! Importantly, “a ‘working’ github repo” is really not enough to ensure that your results are reproducible, let alone ensure external validity.
From personal observation, kids learn text (say, from a children’s book, and from songs) back-to-front. That is, the adult will say all but the last word in the sentence, and the kid will (eventually) learn to chime in to complete the sentence.
This feels correlated to LLMs learning well when tasked with next-token prediction, and those predictions being stronger (less uniform over the vocabulary) when the preceding sequences get longer.
I wonder if there’s a connection to having rhyme “live” in the last sound of each line, as opposed to the first.
A lot of memory seems to be linear, possibly because most information in the world is encoded linearly. If I was to tell you the 20th letter of the alphabet, I’d have to go through every letter it in my head. It’s a linked-list data structure.
Even many memory techniques, like the mind palace, is ordered, with each item linking to the next.
I don’t think this is the same as markov-chains or predicting the next item, but that it has to do with the most common data structure of information being linear.
As for making the first word rhyme instead of the last, that’s an interesting thought! I actually have no idea. When I rhyme like that in my head, it sounds wrong, but I couldn’t tell you the reason. You may be on to something.