“Labs” are not an actor. No one lab has a moat around compute; at the very least Google, OpenAI, Anthropic, xAI, and Facebook all have access to plenty of compute. It only takes one of them to sell access to their models publicly.
johnswentworth
The research direction needs to be actually pursued by the agents, either through the decision of the human leadership, or through the decision of AI agents that the human leadership defers to. This means that if some long-term research bet isn’t respected by lab leadership, it’s unlikely to be pursued by their workforce of AI agents.
I think you’re starting from a good question here (e.g. “Will the research direction actually be pursued by AI researchers?”), but have entirely the wrong picture; lab leaders are unlikely to be very relevant decision-makers here. The key is that no lab has a significant moat, and the cutting edge is not kept private for long, and those facts look likely to remain true for a while. Assuming even just one cutting-edge lab continues to deploy at all like they do today, basically-cutting-edge models will be available to the public, and therefore researchers outside the labs can just use them to do the relevant research regardless of whether lab leadership is particularly invested. Just look at the state of things today: one does not need lab leaders on board in order to prompt cutting-edge models to work on one’s own research agenda.
That said, “Will the research direction actually be pursued by AI researchers?” remains a relevant question. The prescription is not so much about convincing lab leadership, but rather about building whatever skills will likely be necessary in order to use AI researchers productively oneself.
Is interpersonal variation in anxiety levels mostly caused by dietary iron?I stumbled acrossthis paperyesterday. I haven’t looked at it very closely yet, but the high-level pitch is that they look at genetic predictors of iron deficiency and then cross that with anxiety data. It’s interesting mainly because it sounds pretty legit (i.e. the language sounds like direct presentation of results without any bullshitting, the p-values are satisfyingly small, there’s no branching paths), and the effect sizes are BIG IIUC:The odd ratios (OR) of anxiety disorders per 1 standard deviation (SD) unit increment in iron status biomarkers were 0.922 (95% confidence interval (CI) 0.862–0.986; p = 0.018) for serum iron level, 0.873 (95% CI 0.790–0.964; p = 0.008) for log-transformed ferritin and 0.917 (95% CI 0.867–0.969; p = 0.002) for transferrin saturation. But no statical significance was found in the association of 1 SD unit increased total iron-binding capacity (TIBC) with anxiety disorders (OR 1.080; 95% CI 0.988–1.180; p = 0.091). The analyses were supported by pleiotropy test which suggested no pleiotropic bias.Odds ratio of anxiety disorders changes by roughly 0.9 per standard deviation in iron level, across four different measures of iron level. (Note that TIBC, the last of the four iron level measures, didn’t hit statistical significance but did have a similar effect size to the other three.)Just eyeballing those effect sizes… man, it kinda sounds like iron levels are maybethemain game for most anxiety? Am I interpreting that right? Am I missing something here?EDIT: I read more, and it turns out the wording of the part I quoted was misleading. The number 0.922, for instance, was the odds ratio AT +1 standard deviation serum iron level, not PER +1 standard deviation serum iron level. That would be −0.078 PER standard deviation serum iron level, so it’s definitely not the “main game for most anxiety”.
That was a very useful answer, thank you! I’m going to try to repeat back the model in my own words and relate back to the frame of the post and rest of this comment thread. Please let me know if it still sounds like I’m missing something.
Model: the cases where a woman has a decent idea of what she wants aren’t the central use-case of subtle cues in the first place. The central use case is when the guy seems maybe interesting, and therefore she mostly just wants to spend more time around him, not in an explicitly romantic or sexual way yet. The “subtle signals” mostly just look like “being friendly”, and that’s a feature because in the main use case “being friendly” is in fact basically what the girl wants; she actually does just want to spend a bit more time together in a friendly way, with romantic/sexual interaction as a possibility on the horizon rather than an immediate interest.
Relating that back to the post and comment thread: subtle signals still seem like a pretty stupid choice once a woman is confident she’s romantically/sexually interested in a guy. So “send subtle signals” remains terrible advice for women in that situation. But if one is going to give the opposite advice—i.e. advise a policy of asking out men she’s interested in and/or sending extremely unsubtle signals—then it should probably come alongside the disclaimer that if she’s highly uncertain and mostly wants to gather more information (which may be most of the time) then subtle signals are a sensible move. If the thing she actually wants right now is to just spend some more time together in a friendly way, with romantic/sexual interaction as a possibility on the horizon rather than an immediate thing, then subtle signals are not a bad choice. The subtle signals are mostly not distinguishable from “being friendly” because “being friendly” is in fact the (immediate) goal.
What I really like about this model is that it answers the question “under what circumstances should one apply this advice vs apply its opposite?”, which is something which most advice should answer but doesn’t.
Based on personal experience, the strategy fails to even achieve that goal.
And fun though it is to exchange clever quips, I would much rather have actual answers to useful questions, such as “just how often does anyone at all correctly pick up on the intended subtle signals?”.
Sometimes this is due to the woman in question not recognizing how subtle she’s being, and losing out on a date with a man she’s still interested in.
I would guess that this is approximately 100% of the time in practice, excluding cases where the man doesn’t pick up on the cues but happens to ask her out anyway. Approximately nobody accurately picks up on womens’ subtle cues, including other women (at least that would be my strong guess, and is very cruxy for me here). If the woman just wants a guy who will ask her out, that’s still a perfectly fine utility function, but the cues serve approximately-zero role outside of the woman’s own imagination.
If the typical case was actually to send very clear unambiguous cues which most men (or at least most hot men) actually reliably can pick up on, then I would not call the strategy “completely fucking idiotic”; sending signals which the intended recipient can actually reliably pick up on is a totally reasonable and sensible strategy.
(Of course there’s an obvious alternative hypothesis: most men do pick up on such cues, and I’m overindexing on myself or my friends or something. I am certainly tracking that hypothesis, I am well aware that my brain is not a good model of other humans’ brains, but man it sure sounds like “not noticing womens’ subtle cues” is the near-universal experience, even among other women when people actually try to test that.)
- Apr 10, 2025, 7:30 AM; 8 points) 's comment on Navigation by Moonlight by (
- Apr 11, 2025, 6:10 AM; 2 points) 's comment on Navigation by Moonlight by (
The yin master finds out from her friend that a certain guy with potential is coming to dinner. She wears a dress with a subtle neckline and arrives early to secure a seat with an empty chair next to it. When the guy arrives, she holds eye contact a fraction longer than usual and invites him to the empty chair by laughing a fraction louder at his silly joke. She leans away when he starts talking sports, but angles slightly towards him when he’s sharing a story of how he got in trouble that one time abroad. The story is good, so she mentions off-handedly how long its been since she saw a good painting and it’s his cue to ask for her number.
This was the one fleshed-out concrete example in the post, and the best part IMO.
I found it quite helpful for visualizing what’s going on in so many women’s imaginations. The post itself delivered value, in that regard.
That said, the post conspicuously avoids asking: how well will this yin strategy actually work? How much will the yin strategy improve this girl’s chance of a date with the guy, compared to (a) doing nothing and acting normally, or (b) directly asking him out? It seems very obvious that the yin stuff will result in a date-probability only marginally higher than doing nothing (I’d say 1-10 percentage points higher at most, if I had to Put A Number On It), and far far lower than if she asks him (I’d say tens of percentage points).
That is what we typically call a completely fucking idiotic strategy. Just do the simple direct thing, chances of success will be far higher.
Now, one could reasonably counter-argue that the yin strategy delivers value somewhere else, besides just e.g. “probability of a date”. Maybe it’s a useful filter for some sort of guy, or maybe she just wants to interact with people this way because she enjoys it? I won’t argue with the utility function; people want what they want. But I will observe that it’s an awfully large trade-off, and I very much doubt that an actual consideration of the tradeoffs under the hypothetical woman’s preferences actually works out in favor of the subtle approach, under almost any real conditions.
(Context: I’ve been following the series in which this post appeared, and originally posted this comment on the substack version.)
Overcompressed summary of this post: “Look, man, you are not bottlenecked on models of the world, you are bottlenecked on iteration count. You need to just Actually Do The Thing a lot more times; you will get far more mileage out of iterating more than out of modeling stuff.”.
I definitely buy that claim for at least some people, but it seems quite false in general for relationships/dating.
Like, sure, most problems can be solved by iterating infinitely many times. The point of world models is to not need so many damn iterations. And because we live in a very high-dimensional world, naive iteration will often not work at all without a decent world model; one will never try the right things without having some prior idea of where to look.
Example: Aella’s series on how to be good in bed. I was solidly in the audience for that post: I’d previously spent plenty of iterations becoming better in bed, ended up with solid mechanics, but consistently delivering great orgasms does not translate to one’s partner wanting much sex. Another decade of iteration would not have fixed that problem; I would not have tried the right things, my partner would not have given the right feedback (indeed, much of her feedback was in exactly the wrong direction). Aella pointed in the right vague direction, and exploring in that vague direction worked within only a few iterations. That’s the value of models: they steer the search so that one needs fewer iterations.
That’s the point of all the blog posts. That’s where the value is, when blog posts are delivering value. And that’s what’s been frustratingly missing from this series so far. (Most of my value of this series has been from frustratedly noticing the ways in which it fails to deliver, and thereby better understanding what I wish it would deliver!) No, I don’t expect to e.g. need 0 iterations after reading, but I want to at least decrease the number of iterations.
And in regards to “I don’t think you know what you want in dating”… the iteration problem still applies there! It is so much easier to figure out what I want, with far fewer iterations, when I have better background models of what people typically want. Yes, there’s a necessary skill of not shoving yourself into a box someone else drew, but the box can still be extremely valuable as evidence of the vague direction in which your own wants might be located.
(This comment is not about the parts which most centrally felt anchored on social reality; see other reply for that. This one is a somewhat-tangential but interesting mini-essay on ontological choices.)
The first major ontological choices were introduced in the previous essay:
Thinking of “capability” as a continuous 1-dimensional property of AI
Introducing the “capability frontier” as the highest capability level the actor developed/deployed so far
Introducing the “safety range” as the highest capability level the actor can safely deploy
Introducing three “security factors”:
Making the safety range (the happy line) go up
Making the capability frontier (the dangerous line) not go up
Keeping track of where those lines are.
The first choice, treatment of “capability level” as 1-dimensional, is obviously an oversimplification, but a reasonable conceit for a toy model (so long as we remember that it is toy, and treat it appropriately). Given that we treat capability level as 1-dimensional, the notion of “capability frontier” for any given actor immediately follows, and needs no further justification.
The notion of “safety range” is a little more dubious. Safety of an AI obviously depends on a lot of factors besides just the AI’s capability. So, there’s a potentially very big difference between e.g. the most capable AI a company “could” safely deploy if the company did everything right based on everyone’s current best understanding (which no company of more than ~10 people has or ever will do in a novel field), vs the most capable AI the company could safely deploy under realistic assumptions about the company’s own internal human coordination capabilities, vs the most capable AI the company can actually-in-real-life aim for and actually-in-real-life not end up dead.
… but let’s take a raincheck on clarifying the “safety range” concept and move on.
The safety factors are a much more dubious choice of ontology. Some of the dubiousness:
If we’re making the happy line go up (“safety progress”): who’s happy line? Different actors have different lines. If we’re making the danger line not go up (“capability restraint”): again, who’s danger line? Different actors also have different danger lines.
This is important, because humanity’s survival depends on everybody else’s happy and danger lines, not just one actor’s!
“Safety progress” inherits all the ontological dubiousness of the “safety range”.
If we’re keeping track of where the lines are (“risk evaluation”): who is keeping track? Who is doing the analysis, who is consuming it, how does the information get to relevant decision makers, and why do they make their decisions on the basis of that information?
Why factor apart the levels of the danger and happy lines? These are very much not independent, so it’s unclear why it makes sense to think of their levels separately, rather than e.g. looking at their average and difference as the two degrees of freedom, or their average and difference in log space, or the danger line level and the difference, or [...]. There’s a lot of ways to parameterize two degrees of freedom, and it’s not clear why this parameterization would make more sense than some other.
On the other hand, factoring apart “risk evaluation” from “safety progress” and “capabilities restraint” does seem like an ontologically reasonable choice: it’s the standard factorization of instrumental from epistemic. That standard choice is not always the right way to factor things, but it’s at least a choice which has “low burden of proof” in some sense.
What would it look like to justify these ontological choices? In general, ontological justification involves pointing to some kind of pattern in the territory—in this case, either the “territory” of future AI, or the “territory” of AI safety strategy space. For instance, in a very broad class of problems, one can factor apart the epistemic and instrumental aspects of the problem, and resolve all the epistemic parts in a manner totally agnostic to the instrumental parts. That’s a pattern in the “territory” of strategy spaces, and that pattern justifies the ontological choice of factoring apart instrumental and epistemic components of a problem.
If one could e.g. argue that the safety range and capability frontier are mostly independent, or that most interventions impact the trajectory of only one of the two, then that would be an ontological justification for factoring the two apart. (Seems false.)
(To be clear: people very often have good intuitions about ontological choices, but don’t know how to justify them! I am definitely not saying that one must always explicitly defend ontological choices, or anything like that. But one should, if asked and given time to consider, be able to look at an ontological choice and say what underlying pattern makes that ontological choice sensible.)
The last section felt like it lost contact most severely. It says
What are the main objections to AI for AI safety?
It notably does not say “What are the main ways AI for AI safety might fail?” or “What are the main uncertainties?” or “What are the main bottlenecks to success of AI for AI safety?”. It’s worded in terms of “objections”, and implicitly, it seems we’re talking about objections which people make in the current discourse. And looking at the classification in that section (“evaluation failures, differential sabotage, dangerous rogue options”) it indeed sounds more like a classification of objections in the current discourse, as opposed to a classification of object-level failure modes from a less-social-reality-loaded distribution of failures.
I do also think the frame in the earlier part of the essay is pretty dubious in some places, but that feels more like object-level ontological troubles and less like it’s anchoring too much on social reality. I ended up writing a mini-essay on that which I’ll drop in a separate reply.
no synonyms
[...]
Use compound words.
These two goals conflict. When compounding is common, there will inevitably be multiple reasonable ways to describe the same concept as a compound word. I think you probably want flexible compounding more than a lack of synonyms.
When this post first came out, it annoyed me. I got a very strong feeling of “fake thinking”, fake ontology, etc. And that feeling annoyed me a lot more than usual, because Joe is the person who wrote the (excellent) post on “fake vs real thinking”. But at the time, I did not immediately come up with a short explanation for where that feeling came from.
I think I can now explain it, after seeing this line from kave’s comment on this post:
Your taxonomies of the space of worries and orientations to this question are really good...
That’s exactly it. The taxonomies are taxonomies of the space of worries and orientations. In other words, the post presents a good ontology of the discourse on “AI for AI safety”. What it does not present is an ontology natural to the actual real-world challenges of AI for AI safety.
Unpacking that a bit: insofar as an ontology (or taxonomy) is “good”, it reflects some underlying pattern in the world, and it’s useful to ask what that underlying pattern is. For instance, it does intuitively seem like most objections to “AI for AI safety” I hear these days cluster reasonably well into “Evaluation failures, Differential sabotage, Dangerous rogue options”. Insofar as the discourse really does cluster that way, that’s a real pattern in the world-of-discourse, and those categories are useful for modelling the discourse. But the patterns in the discourse mostly reflect social dynamics; they are only loosely coupled to the patterns which will actually arise in future AIs, or in the space of strategies for dealing with future AIs. Thus the feeling of “fakeness”: it feels like this post is modeling the current discourse, rather than modeling the actual physical future AIs.
… and to be clear, that’s not strictly a bad thing. Modeling the discourse might actually be the right move, insofar as one’s main goal is to e.g. facilitate communication. That communication just won’t be very tightly coupled to future physical reality.
I have a similar story. When I was very young, my mother was the primary breadwinner of the household, and put both herself and my father through law school. Growing up, it was always just kind of assumed that my sister would have to get a real job making actual money, same as my brother and I; a degree in underwater basket weaving would have required some serious justification. (She ended up going to dental school and also getting a PhD working with epigenomic data.)
I didn’t realize on a gut level that this wasn’t the norm until shortly after high school. I was hanging out with two female friends and one of them said “man, I really need more money”. I replied “sounds like you need to get a job”. The friend laughed and said “oh, I was thinking I need to get a boyfriend”, and then the other friend also laughed and said she was also thinking the boyfriend thing.
… so that was quite a shock to my worldview.
Not important, but: I clicked on this post expecting an essay about building physical islands outside of San Francisco bay.
This comment gave me the information I’m looking for, so I don’t want to keep dragging people through it. Please don’t feel obligated to reply further!
That said, I did quickly look up some data on this bit:
But remember that you already conditioned on ‘married couples without kids’. My guess would be that in the subset of man-woman married couples without kids, the man being the exclusive breadwinner is a lot less common than in the set of all man-woman married couples.
… so I figured I’d drop it in the thread.
When interpreting these numbers, bear in mind that many couples with no kids probably intend to have kids in the not-too-distant future, so the discrepancy shown between “no children” and 1+ children is probably somewhat smaller than the underlying discrepancy of interest (which pushes marginally more in favor of Lucius’ guess).
Big thank you for responding, this was very helpful.
That is useful, thanks.
Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much? In particular, if the answer is in fact “most men would be happier single but are ideologically attached to believing in love”, then I want to be able to update accordingly. And if the answer is not that, then I want to update that most men would not be happier single. With the current discussion, most of what I’ve learned is that lots of people are triggered by the question, but that doesn’t really tell me much about the underlying reality.
Update 3 days later: apparently most people disagree strongly with
Their romantic partner offering lots of value in other ways. I’m skeptical of this one because female partners are typically notoriously high maintenance in money, attention, and emotional labor. Sure, she might be great in a lot of ways, but it’s hard for that to add up enough to outweigh the usual costs.
Most people in the comments so far emphasize some kind of mysterious “relationship stuff” as upside, but my actual main update here is that most commenters probably think the typical costs are far far lower than I imagined? Unsure, maybe the “relationship stuff” is really ridiculously high value.
So I guess it’s time to get more concrete about the costs I had in mind:
A quick google search says the male is primary or exclusive breadwinner in a majority of married couples. Ass-pull number: the monetary costs alone are probably ~50% higher living costs. (Not a factor of two higher, because the living costs of two people living together are much less than double the living costs of one person. Also I’m generally considering the no-kids case here; I don’t feel as confused about couples with kids.)
I was picturing an anxious attachment style as the typical female case (without kids). That’s unpleasant on a day-to-day basis to begin with, and I expect a lack of sex tends to make it a lot worse.
Eyeballing Aella’s relationship survey data, a bit less than a third of respondents in 10-year relationships reported fighting multiple times a month or more. That was somewhat-but-not-dramatically less than I previously pictured. Frequent fighting is very prototypically the sort of thing I would expect to wipe out more-than-all of the value of a relationship, and I expect it to be disproportionately bad in relationships with little sex.
Less legibly… conventional wisdom sure sounds like most married men find their wife net-stressful and unpleasant to be around a substantial portion of the time, especially in the unpleasant part of the hormonal cycle, and especially especially if they’re not having much sex. For instance, there’s a classic joke about a store salesman upselling a guy a truck, after upselling him a boat, after upselling him a tackle box, after [...] and the punchline is “No, he wasn’t looking for a fishing rod. He came in looking for tampons, and I told him ‘dude, your weekend is shot, you should go fishing!’”.
(One thing to emphasize in these: sex isn’t just a major value prop in its own right, I also expect that lots of the main costs of a relationship from the man’s perspective are mitigated a lot by sex. Like, the sex makes the female partner behave less unpleasantly for a while.)
So, next question for people who had useful responses (especially @Lucius Bushnaq and @yams): do you think the mysterious relationship stuff outweighs those kinds of costs easily in the typical case, or do you imagine the costs in the typical case are not all that high?
men are the ones who die sooner if divorced, which suggests
Causality dubious, seems much more likely on priors that men who divorced are disproportionately those with Shit Going On in their lives. That said, it is pretty plausible on priors that they’re getting a lot out of marriage.
I have no idea what you’re picturing here. Those sentences sounded like a sequence of nonsequiturs, which means I probably am completely missing what you’re trying to say. Maybe spell it out a bit more?
Some possibly-relevant points:
The idea that all the labs focus on speeding up their own research threads rather than serving LLMs to customers is already pretty dubious. Developing LLMs and using them are two different skillsets; it would make economic sense for different entities to specialize in those things, with the developers selling model usage to the users just as they do today. More capable AI doesn’t particularly change that economic logic. I wouldn’t be surprised if at least some labs nonetheless keep things in-house, but all of them?
The implicit assumption that alignment/safety research will be bottlenecked on compute at all likewise seems dubious at best, though I could imagine an argument for it (routing through e.g. scaling inference compute).
It sounds like maybe you’re assuming that there’s some scaling curve for (alignment research progress as a function of compute invested) and another for (capabilities progress as a function of compute invested), and you’re imagining that to keep the one curve ahead of the other, the amount of compute aimed at alignment needs to scale in a specific way with the amount aimed at capabilities? (That model sounds completely silly to me, that is not at all how this works, but it would be consistent with the words you’re saying.)