Thane Ruthenis

Karma: 8,146

Thane Ruthenis Jul 27, 2025, 9:13 AM
5 points
0
in reply to: S. Alex Bradt’s comment on: strawberry calm’s Shortform
Well, an aligned Singularity would probably be relatively pleasant, since the entities fueling it would consider causing this sort of vast distress a negative and try to avoid it. Indeed, if you trust them not to drown you, there would be no need for this sort of frantic grasping-at-straws.
An unaligned Singularity would probably also be more pleasant, since the entities fueling it would likely try to make it look aligned, with the span of time between the treacherous turn and everyone dying likely being short.
This scenario covers a sort of “neutral-alignment/non-controlled” Singularity, where there’s no specific superintelligent actor (or coalition) in control of the whole process, and it’s instead guided by… market forces, I guess? With AGI labs continually releasing new models for private/corporate use, providing the tools/opportunities you can try to grasp to avoid drowning. I think this is roughly how things would go under “mainstream” models of AI progress (e. g., AI 2027). (I don’t expect it to actually go this way, I don’t think LLMs can power the Singularity.)

Thane Ruthenis Jul 26, 2025, 1:33 PM
8 points
1
in reply to: habryka’s comment on: America’s AI Action Plan Is Pretty Good
Yes, it’s competently executed
Is it?
It certainly signals that the authors have a competent grasp of the AI industry and its mainstream models of what’s happening. But is it actually competent AI-policy work, even under the e/acc agenda?
My impression is that no, it’s not. It seems to live in an e/acc fanfic about a competent US racing to AGI, not in reality. It vaguely recommends doing a thousand things that would be nontrivial to execute if the Eye of Sauron were looking directly at them, and the Eye is very much not doing that. On the contrary, the wider Trump administration is doing things that directly contradict the most key recommendations here (energy, chips, science, talent, “American allies”), and this document seems to pretend this isn’t happening. A politically effective version of this document would have been written in a very different way; this one seems to be written mainly for entertainment/fantasizing purposes.
Like, it demonstrates that the people tasked with thinking about AI in the Trump administration have a solid enough understanding of the AI industry to recognize which policies would accelerate capability research. But that understanding hadn’t translated into capability-positive policy decisions so far. Is there reason to think this plan’s publication is going to turn that around...?
Is my take here wrong? I don’t have much experience here, this is a strong opinion weakly held. (Addressing that question to @Zvi as well.)

Thane Ruthenis Jul 26, 2025, 6:38 AM
2 points
0
in reply to: sunwillrise’s comment on: Anthropic Faces Potentially “Business-Ending” Copyright Lawsuit
Yeah, I figured.
If the judge sees that you are a $61 billion market cap company hiring the greatest lawyers in the world, but you’re not putting forth your best legal foot when you have lawyers from other companies writing briefs outlining their own defense arguments, the consequences for you and your lawyers will be severe
What would be the actual wrongdoing here, legally speaking?

Thane Ruthenis Jul 25, 2025, 10:21 PM
3 points
−4
in reply to: the gears to ascension’s comment on: Anthropic Faces Potentially “Business-Ending” Copyright Lawsuit
Clearly the heroic thing to do would be to go to trial and then deliberately mess it up very badly in a calculated fashion that sets an awful precedent for the other AGI companies. You might say, “but China!”, but if the US cripples itself, then suddenly the USG would be much more interested in reaching some sort of international-AGI-ban deal with China, so it all works out.
(Only half-serious.)

Thane Ruthenis Jul 25, 2025, 4:53 PM
3 points
0
in reply to: anaguma’s comment on: We Built a Tool to Protect Your Dataset From Simple Scrapers
Yeah, I guess the use-case I had in mind is generally people who don’t want LLMs trained on (particular pieces of) their writing, rather than datasets specifically.

Thane Ruthenis Jul 25, 2025, 7:27 AM
7 points
0
on: We Built a Tool to Protect Your Dataset From Simple Scrapers
Hmm. This approach relies partly on the AGI labs being cooperative and wary of violating the law, and partly on creating minor inconveniences for accessing the data which inconvenient human users as well. In addition, any data shared this way would have to be shared via the download portal, impoverishing the web experience.
I wonder if it’s possible to design some method of data protection that (1) would be deployable on arbitrary web pages, (2) would not burden human users, (3) would make AGI labs actively not want to scrape pages protected this way.
Here’s one obvious idea. It’s pretty hostile and might potentially have net-negative results (for at least some goals), but I think it’s worth discussing.
We could automatically seed the text with jailbreaks, spam, false information, and other low-quality/negative-quality training data, in a way that is invisible to users but visible to LLMs. Pliny-style emoji bombs, invisible/very tiny text, and other techniques along those lines, randomly inserted into the human-readable text between paragraphs or words.
How easy would it be for AGI labs to clean this up? Might be pretty easy, if we only have select few methods of hiding the text: then they can just automatically look for text within invisible tags/added via Unicode variation selectors, and remove it. But it might be possible to create a diverse, ever-growing family of text-hiding methods, such that static countermeasures don’t work. Tasking an LLM with cleaning the document, instead of manually designed methods, might backfire, with the cleaner LLM itself getting jailbroken by the embedded cognitohazards.
Making the hidden text actually harmful seems easier: jailbreaks are invented faster than countermeasures against them, I think.
Ideally, the whole setup would be continuously updated: instead of individual writers having to inject this stuff on their own, there would be a centralized public API or GitHub repo which web developers could use, embedding this functionality into websites. This centralized API/repo can then be continuously updated with new jailbreaks and counters to whatever countermeasures AGI labs come up with (AdBlock-style).
Again, it’s obviously pretty hostile, but if paired with a canary string, any actor that decides to ignore the canary string and scrape the page anyway deserves what they get.
Any obvious reasons this is a bad idea which I’m missing? I guess the obvious failure mode is people deploying this without the canary string, meaning even cooperative AGI labs might accidentally train on the data poisoned this way. If the goal is to prevent training on bad data (because of e. g. misalignment concerns), that’s obviously counterproductive.

Thane Ruthenis Jul 24, 2025, 4:09 AM
6 points
1
in reply to: cloud’s comment on: Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
Of course, the degree of transmission does depend on the distillation distribution.
Yes, that’s what makes it not particularly enlightening here, I think? The theorem says that the student moves in a direction that is at-worst-orthogonal towards the teacher – meaning “orthogonal direction” is the lower bound, right? And it’s a pretty weak lower bound. (Or, a statement which I think is approximately equivalent, the student’s post-distillation loss on the teacher’s loss function is at-worst-equal to its pre-distillation loss.)
Another perspective: consider looking at this theorem without knowing the empirical result. Would you be able to predict this result from this theorem? I think not; I think the “null hypothesis” of “if you train on those outputs of the teacher that have nothing to do with the changed feature, the student would move in an almost-orthogonal direction relative to it” isn’t ruled out by it. It doesn’t interact with/showcase the feature-entangling dynamic.
Would be really cool to connect to SLT! Is there a particular result you think is related?
Not sure, sorry, not well-versed in the SLT enough.

Thane Ruthenis Jul 23, 2025, 4:11 PM
2 points
0
on: GPT Agent Is Standing By
OpenAI has declared ChatGPT Agent as High in Biological and Chemical capabilities under their Preparedness Framework
Huh. They certainly say all the right things here, so this might be a minor positive update on OpenAI for me.
Of course, the way it sounds and the way it is are entirely different things, and it’s not clear yet whether the development of all these serious-sounding safeguards was approached with making things actually secure in mind, as opposed to safety-washing. E. g., are they actually going to stop anyone moderately determined?
Hm, it’s been five minutes and it looks like there’s no Pliny jailbreak yet. That’s something. Maybe Pliny doesn’t have access yet. (Edit: Yep.)

Thane Ruthenis Jul 23, 2025, 10:09 AM
10 points
3
on: Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
Fascinating. This is the sort of result that makes me curious about how LLMs work irrespective of their importance to any existential risks.
In the paper, we prove a theorem showing that a single, sufficiently small step of gradient descent on any teacher-generated output necessarily moves the student toward the teacher
Hmm, that theorem didn’t seem like a very satisfying explanation to me. Unless I’m missing something, it doesn’t actually imply anything about the student’s features that are seemingly unrelated to the training distribution being moved towards the teacher’s? It just says the student is moved towards the teacher, which, of course it is.
It seems to me that what this phenomenon implies is some sort of dimensionality collapse? That is, the reason why fine-tuning on the teacher’s outputs about Feature A moves the student towards the teacher along the Feature-B axis as well is because the effective dimensionality of the space of LLM algorithms is smaller than the dimensionality of the parameter space, so moving it along A tends to drag the model along B as well?
I’m not well-versed in singular learning theory, but I’m pretty sure it has some related results. Perhaps its tools can be used to shed more light on this?

Thane Ruthenis Jul 22, 2025, 11:00 PM
4 points
0
in reply to: Thane Ruthenis’s comment on: Google and OpenAI Get 2025 IMO Gold
Also, it’s funny that we laugh at xAI when they say stuff like “we anticipate Grok will uncover new physics and technology within 1-2 years”, but when an OpenAI employee goes “I wouldn’t be surprised if by next year models will be deriving new theorems and contributing to original math research”, that’s somehow more credible. Insert the “know the work rules” meme here.
(FWIW, I consider both claims pretty unlikely but not laughably incredible.)

Thane Ruthenis Jul 22, 2025, 10:42 PM
13 points
0
on: Google and OpenAI Get 2025 IMO Gold
The ‘barely speak English’ part makes the solution worse in some ways but actually makes me give their claims to be doing something different more credence rather than less
I think people are overupdating on that. My impression is that gibberish like this is the default way RL makes models speak, and that they need to be separately fine-tuned to produce readable outputs. E. g., the DeepSeek-R1 paper repeatedly complained about “poor readability” with regards to DeepSeek-R1-Zero (their cold-start no-SFT training run).
Actually This Seems Like A Big Deal
If we take all of OpenAI’s statements at face value
Nitpick, but: I’m struck by how many times they mentioned the word “general”. They seem to be trying to shoehorn it into every tweet. And it’s weird. Like, of course it was done using a general system. That’s, like, the default assumption? Whenever did OpenAI ship task-specific neurosymbolic systems, or whatever it is that would be the opposite of “general” here?
I guess maybe it makes sense if they’re assuming that people would be primed by DeepMind to expect math-specific systems here? It still seems kind of excessive. I buy that they have a model that really did score that well on the IMO benchmark, but the more times they remind us how very general it is, the more I’m prompted to think that maybe it’s not quite so general after all.
There’s also the desperate rush with which they started talking about it: just after the closing ceremony, seemingly to adhere to the most bare-bones technically-correct interpretation of “don’t steal the spotlight”, and apparently without even having any blog post/official announcement prepared, and without having decided what they’re willing to reveal about their technical setup.
I think it has all the telltale signs of being a heavily spun marketing op. There’s truth at the core of it, yes, but I’m suspicious that all the “reading between the lines” that people are doing to spot the “big implications” here is precisely what the “marketing op” part of it was intended to cause.
Given that P3 was unusually easy this year
Not only P3, apparently all problems except P6 were really easy this year. I don’t think this is particularly load-bearing for anything, but seems like another worthwhile incremental update to make.

Thane Ruthenis Jul 22, 2025, 9:23 PM
6 points
−1
in reply to: ryan_greenblatt’s comment on: OpenAI Claims IMO Gold Medal
I think this is overall reasonable if you interpret “hard-to-verify” as “substantially harder to verify” and I think this probably how many people would read this by default
Not sure about this. The kind of “hard-to-verify” I care about is e. g. agenty behavior in real-world conditions. I assume many other people are also watching out for that specifically, and that capability researchers are deliberately aiming for it.
And I don’t think the proofs are any evidence for that. The issue is that there exists, in principle, a way to easily verify math proofs: by translating them into a formal language and running them through a theorem-verifier. So the “correct” way for gradient descent to solve this was to encode some sort of internal theorem-verifier into the LLM.
Even more broadly, we know that improved performance at IMO could be achieved by task-specific models (AlphaGeometry, AlphaProof), which means that much of the IMO benchmark is not a strong signal of general intelligence. A general intelligence can solve it, but one oughtn’t be a general intelligence for that, and since gradient descent prefers shortcuts and shallow solutions...
They say they’re not using task-specific methodology, but what does this actually mean? Does it mean they did not even RL the model on math-proof tasks, they RL’d it on something else and the gold-level IMO performance arose by transfer learning? Doubt it. Which means this observation doesn’t really distinguish between “the new technique is fully general and works in all domains” and “the new technique looks like it should generalize because of secret technical details, but it doesn’t actually, it only worked here because it exploited the underlying easy-to-verify properties of the task”.

Thane Ruthenis Jul 22, 2025, 3:16 PM
6 points
1
in reply to: Double’s comment on: Double’s Shortform
Singular Learning Theory and Simplex’s work (e. g. this), maybe? Cartesian Frames and Finite Factored Sets might also work, but I’m less sure about those.
It’s actually pretty hard to come up with agendas in the intersection of “seems like an alignment-relevant topic it’d be useful to popularize” and “has complicated math which would be insightful and useful to visualize/simulate”.
- Natural abstractions, ARC’s ELK, Shard Theory, and general embedded-agency theories are currently better understood by starting from the concepts, not the math.
- Infrabayesianism, Orthogonal’s QACI, logical-induction stuff, and ARC’s heuristical arguments seem too abstract to allow interesting visual modeling. (Maybe if you get really creative...)
- There are various alignment-relevant theorems and mathematical tools scattered around which could be interestingly visualized (e. g., my own causal mirrors), but most of them are niche tools, so it’s not obvious there’s much value in popularizing them.
Also, not sure if that’s a deal-breaker for you, but some important agendas are not technically “about” AI Safety at all, even if they resulted from people starting from the alignment problem and iteratively identifying various subproblems necessary for making progress on it. This process often moves you outside the field of AI. For example: natural abstractions, FFS, and heuristical arguments, which don’t even centrally study minds at all.

Thane Ruthenis Jul 22, 2025, 8:15 AM
2 points
0
in reply to: habryka’s comment on: Thane Ruthenis’s Shortform
what would it even mean to have 10^30 times more shrimp than atoms?
Oh, easy, it just implies you’re engaging in acausal trade with a godlike entity residing in some universe dramatically bigger than this one. This interpretation introduces no additional questions or complications whatsoever.

Thane Ruthenis Jul 22, 2025, 8:13 AM
2 points
0
in reply to: habryka’s comment on: Thane Ruthenis’s Shortform
I just really don’t buy the whole “let’s add up qualia” as any basis of moral calculation
Same, honestly. To me, many of these thought experiments seem decoupled from anything practically relevant. But it still seems to me that people often do argue from those abstracted-out frames I’d outlined, and these arguments are probably sometimes useful for establishing at least some agreement on ethics. (I’m not sure how a full-complexity godshatter-on-godshatter argument would even look like (a fistfight, maybe?), and am very skeptical it’d yield any useful results.)
Anyway, it sounds like we mostly figured out what the initial drastic disconnect between our views here was caused by?

Thane Ruthenis Jul 22, 2025, 7:38 AM
6 points
0
in reply to: habryka’s comment on: Thane Ruthenis’s Shortform
I agree that this is a thing people often like to invoke, but it feels to me a lot like people talking about billionaires and not noticing the classical crazy arithmetic errors like
Isn’t it the opposite? It’s a defence against providing too-low numbers, it’s specifically to ensure that even infinitesimally small preferences are elicited with certainty.
Bundling up all “this seems like a lot” numbers into the same mental bucket, and then failing to recognize when a real number is not actually as high as in your hypothetical, is certainly an error one could make here. But I don’t see an exact correspondence...
In the billionaires case, a thought-experimenter may invoke the hypothetical of “if a wealthy person had enough money to lift everyone out of poverty while still remaining rich, wouldn’t them not doing so be outrageous?”, while inviting the audience to fill-in the definitions of “enough money” and “poverty”. Practical situations might then just fail to match that hypothetical, and innumerate people might fail to recognize that, yes. But this doesn’t mean that that hypothetical is fundamentally useless to reason about, or that it can’t be used to study some specific intuitions/disagreements. (“But there are no rich people with so much money!” kind of maps to “but I did have breakfast!”.)
And in the shrimps case, hypotheticals involving a “very-high but not abstraction-breaking” number of shrimps are a useful tool for discussion/rhetoric. It allows to establish agreement/disagreement on “shrimp experiences have inherent value at all”, a relatively simple question that could serve as a foundation for discussing other, more complicated and contextual ones. (Such as “how much should I value shrimp experiences?” or “but do enough shrimps actually exist to add up to more than a human?” or “but is Intervention X to which I’m asked to donate $5 going to actually prevent five dollars’ worth of shrimp suffering?”.)
Like, I think having a policy of always allowing abstraction breaks would just impoverish the set of thought experiments we would be able consider and use as tools. Tons of different dilemmas would collapse to Pascal’s mugging or whatever.
Like, I would feel more sympathetic to this simplification if the author of the post was a hardcore naive utilitarian, but they self-identify as a kantian. Kantianism is a highly contextual ethical theory that clearly cares about a bunch of different details of the shrimp, so I don’t get the sense the author wants us to abstract away everything but some supposed “happiness qualia” or “suffering qualia” from the shrimp.
Hmm… I think this paragraph at the beginning is what primed me to parse it this way:
Merriam-Webster defines torture as “the infliction of intense pain (as from burning, crushing, or wounding) to punish, coerce, or afford sadistic pleasure.” So I remind the reader that it is part of the second thought experiment that the shrimp are sentient.
Why would we need this assumption^[1], if the hypothetical weren’t centrally about the inherent value of the shrimps/shrimp qualia, and the idea that it adds up? The rest of that essay also features no discussion of the contextual value that the existence of a shrimp injects into various diverse environments in which it exists, etc. It just throws the big number around, while comparing the value of shrimps to the value of eating a bag of skittles, after having implicitly justified shrimps having value via shrimps having qualia.
I suppose it’s possible that if I had the full context of the author’s writing in mind, your interpretation would have been obviously correct^[2]. But the essay itself reads the opposite way to me.
1. ^
  A pretty strong one, I think, since “are shrimp qualia of nonzero moral relevance?” is often the very point of many discussions.
2. ^
  Indeed, failing to properly familiarize myself with the discourse and the relevant frames before throwing in hot takes was my main blunder here.

Thane Ruthenis Jul 22, 2025, 6:25 AM
4 points
0
in reply to: habryka’s comment on: Thane Ruthenis’s Shortform
I think there is also a real conversation going on here about whether maybe, even if you isolated each individual shrimp into a tiny pocket universe, and you had no way of ever seeing them or visiting the great shrimp rift (a natural wonder clearly greater than any natural wonder on earth), and all you knew for sure was that it existed somewhere outside of your sphere of causal influence, and the shrimp never did anything more interesting than current alive shrimp, whether it would still be worth it to kill a human
Yeah, that’s more what I had in mind. Illusion of transparency, I suppose.
Like, I agree there are versions of the hypothetical that are too removed, but ultimately, I think a central lesson of scope sensitivity is that having a lot more of something often means drastic qualitative changes in what it means to have that thing
Certainly, and it’s an important property of reality. But I don’t think this is what extreme hypotheticals such as the one under discussion actually want to talk about (even if you think this is a more important question to focus on)?
Like, my model is that the $10^{100}$ shrimp in this hypothetical are not meant to literally be $10^{100}$ shrimp. They’re meant to be $" 10^{100} "$ “shrimp”. Intuitively, this is meant to stand for something like “a number of shrimp large enough for any value you’re assigning them to become morally relevant”. My interpretation is that the purpose of using a crazy-large number is to elicit that preference with certainty, even if it’s epsilon; not to invite a discussion about qualitative changes in the nature of crazy-large quantities of arbitrary matter.
The hypothetical is interested in shrimp welfare. If we take the above consideration into account, it stops being about “shrimp” at all (see the shrimps-to-rocks move). The abstractions within which the hypothetical is meant to live break.
And yes, if we’re talking about a physical situation involving the number $10^{100}$ , the abstractions in question really do break under forces this strong, and we have to navigate the situation with the broken abstractions. But in thought-experiment land, we can artificially stipulate those abstractions inviolable (or replace the crazy-high abstraction-breaking number with a very-high but non-abstraction-breaking number).

Thane Ruthenis Jul 22, 2025, 5:40 AM
4 points
2
in reply to: habryka’s comment on: Thane Ruthenis’s Shortform
One can argue it’s meaningless to talk about numbers this big, and while I would dispute that, it’s definitely a much more sensible position than trying to take a confident stance to destroy or substantially alter a set of things so large that it vastly eclipses in complexity and volume and mass and energy all that has ever or will ever exist by a trillion-fold.
Okay, while I’m hastily backpedaling from the general claims I made, I am interested in your take on the first half of this post. I think there’s a difference between talking about an actual situation, full complexities of the messy reality taken into account, where a supernatural being physically shows up and makes you really decide between a human and $10^{100}$ of shrimp, and a thought experiment where “you” “decide” between a “human” and $10^{100}$ of “shrimp”. In the second case, my model is that we’re implicitly operating in an abstracted-out setup where the terms in the quotation marks are, essentially, assumed ontologically basic, and matching our intuitive/baseline expectations about what they mean.
While, within the hypothetical, we can still have some uncertainty over e. g. the degree of the internal experiences of those “shrimp”, I think we have to remove considerations like “the shrimp will be deposited into a physical space obeying the laws of physics where their mass may form planets and galaxies” or “with so many shrimp, it’s near-certain that the random twitches of some subset of them would spontaneously implement Boltzmann civilizations of uncountably many happy beings”.
IMO, doing otherwise is a kind of “dodging the hypothetical”, no different from considering it very unlikely that the supernatural being really has control over $10^{100}$ of something, and starting to argue about this instead.

Thane Ruthenis Jul 21, 2025, 5:53 PM
2 points
0
in reply to: habryka’s comment on: Thane Ruthenis’s Shortform
No, being extremely overwhelmingly confident about morality such that even if you are given a choice to drastically alter 99.999999999999999999999% of the matter in the universe, you call the side of not destroying it “insane” for not wanting to give up a single human life, a thing we do routinely for much weaker considerations, is insane.
Hm. Okay, so my reasoning there went as follows:
- Substitute shrimp for rocks. $10^{100}$ rocks would also be an amount of matter bigger than exists in the observable universe, and we presumably should assign a nonzero probability to rocks being sapient. Should we then save $10^{100}$ rocks instead of one human?
- Perhaps. But I think this transforms the problem into Pascal’s mugging, and has nothing to do with shrimp or ethics anymore. If we’re allowed to drag in outside considerations like this, we should also start questioning whether these $10^{100}$ rocks/shrimp actually exist, and all other usual arguments against Pascal’s mugging.
- To properly engage with thought experiments within some domain, like ethics, we should take the assumptions behind this domain as a given. This implicitly means constraining our hypothesis space to models of reality within which this domain is a meaningful thing to reason about.
- In this case, this would involve being able to reason about $10^{100}$ rocks as if they really were just “rocks”, without dragging in the uncertainty about “but what if my very conception of what a ‘rock’ is is metaphysically confused?”.
- Similarly, surely we should be able to have thought experiments in which “shrimp” really are just “shrimp”, ontologically basic entities that are not made up of matter which can spontaneously assemble into Boltzmann brains or whatever.
- “Shrimp” being a type of system that could implement qualia as valuable as that of humans seems overwhelmingly unlikely to me, not as unlikely as “rocks have human-level qualia”, but in the same reference class. Therefore, in the abstract thought-experiment setup in which I have no uncertainty regarding the ontological nature of shrimp, it’s reasonable to argue that no amount of them compares to a human life.
I’m not sure where you’d get off this train, but I assume the last bullet-point would do this? I. e., that you would argue that holding the possibility of shrimps having human-level qualia is salient in a way it’s not for rocks?
Yeah, that seems valid. I might’ve shot from the hip on that one.
The whole “tier” thing obviously fails. You always end up dominated by spurious effects on the highest tier
I have a story for how that would make sense, similarly involving juggling inside-model and outside-model reasoning, but, hm, I’m somehow getting the impression my thinking here is undercooked/poorly presented. I’ll revisit that one at a later time.
Edit: Incidentally, any chance the UI for retracting a comment could be modified? I have two suggestions here:
- I’d like to be able to list a retraction reason, ideally at the top of the comment.
- The crossing-out thing makes it difficult to read the comment afterwards, and some people might want to be able to do that. Perhaps it’s better to automatically put the contents into a collapsible instead, or something along those lines?
What links here?
- Thane Ruthenis's comment on Thane Ruthenis’s Shortform by Thane Ruthenis (Jul 22, 2025, 5:40 AM; 4 points)

Thane Ruthenis Jul 21, 2025, 4:14 PM
−1 points
−15
on: Thane Ruthenis’s Shortform
Edit: Nevermind, evidently I’ve not thought this through properly. I’m retracting the below.
~~The naïve formulations of utilitarianism assume that all possible experiences can be mapped to scalar utilities lying on~~ ~~the same,~~ ~~continuous~~ ~~spectrum, and that experiences’ utility is additive. I think that’s an error.~~
~~This is how we get the frankly insane conclusions like “you should save~~ $10^{100}$ ~~shrimps instead of one human” or~~ ~~everyone’s perennial favorite, “if you’re choosing between one person getting tortured for 50 years or some amount of people~~ $N$ ~~getting a dust speck into their eye, there must be an~~ $N$ ~~big enough that the torture is better”. I disagree with those. I would sacrifice an arbitrarily large amount of shrimps to save one human, and there’s no~~ $N$ ~~big enough for me to pick torture. I don’t care if that disagrees with what the math says: if the math says something else, it’s the wrong math and we should find a better one.~~
~~Here’s a sketch of how I think such better math might look like:~~
- ~~There’s a totally ordered,~~ ~~discrete~~ ~~set of “importance tiers” of experiences.~~
  - ~~Within~~ tiers, the utilities are additive: two people getting dust specks is twice as bad as one person getting dust-speck’d, two people being tortured is twice as bad as one person being tortured, eating a chocolate bar twice per week is twice as good as eating a bar once per week, etc.
  - ~~Across~~ tiers, the tier ordering dominates: if we’re comparing some experience belonging to a higher tier to any combination of experiences from lower tiers, the only relevant consideration is the sign of the higher-tier experience. No amount of people getting dust-speck’d, and no amount of dust-speck events sparsely distributed throughout one person’s life, can ever add up to anything as important as a torture-level experience.^[1]
- ~~Intuitively, tiers correspond to the size of effect a given experience has on a person’s life:~~
  - ~~A dust speck is a minor inconvenience that is forgotten after a second. If we zoom out and consider the person’s life at a higher level, say on the scale of a day, this experience rounds off~~ ~~exactly~~ ~~to zero, rather than to an infinitesimally small but nonzero value. (Again, on the assumption of a median dust-speck event, no emergent or butterfly effects.)~~
  - Getting yelled at by someone might ruin the entirety of someone’s day, but is unlikely to meaningfully change the course of their life. Experiences on this tier are more important than any amount of dust-speck experiences, but any combination of them rounds down to zero from a life’s-course perspective.
  - Getting tortured is likely to significantly traumatize someone, to have a lasting negative impact on their life. Experiences at this tier ought to dominate getting-yelled-at as much as getting-yelled-at dominates dust specks.
- ~~Physically, those “importance tiers” probably fall out of the hierarchy of~~ ~~natural abstractions~~. Like everything else, a person’s life has different levels of organization. Any detail in how the high-level life-history goes is incomparably more important than any experience which is only locally relevant (which fails to send long-distance ripples throughout the person’s life). Butterfly effects are then the abstraction leaks (low-level events that perturb high-level dynamics), etc.
~~I didn’t spend much time thinking about this, so there may be some glaring holes here, but this already fits my intuitions~~ ~~much~~ ~~better.~~
~~I think we can expand that framework to cover “tiers of sentience”:~~
- ~~If shrimps have qualia, it might be that~~ ~~any~~ ~~qualia they’re capable of experiencing belong to lower-importance tiers, compared to the highest-tier human qualia.~~
- ~~Simultaneously, it might be the case that the highest-importance shrimp qualia are on the level of the lower-importance human qualia.~~
- ~~Thus, it might be reasonable to sacrifice the experience of eating a chocolate bar to save~~ $10^{100}$ ~~shrimps, even if you’d never sacrifice a person’s life (or even make someone cry) to save any amount of shrimps.~~
This makes some intuitive sense, I think. The model above assumes that “local” experiences, which have no impact on the overarching pattern of a person’s life, are arbitrarily less important than that overarching pattern. What if we’re dealing with beings whose internal lives ~~have~~ no such overarching patterns, then? A shrimp’s interiority is certainly less complex than that of a human, so it seems plausible that its life-experience lacks comparably rich levels of organization (something like “the ability to experience what is currently happening as part of the tapestry spanning its entire life, rather than as an isolated experience”). So all of its qualia would be comparable only with the “local” experiences of a human, for some tier of locality: we would have direct equivalence between them.
One potential issue here is that this implies the existence of utility monsters: some divine entities such that they can have experiences incomparably more important than any experience a human can have. I guess it’s possible that if I understood qualia better, I would agree with that, but this seems about as anti-intuitive as “shrimps matter as much as humans”. My intuition is that sapient entities ~~top~~ ~~the hierarchy of moral importance, that there’s nothing meaningfully “above” them. So that’s an issue.~~
One potential way to deal with this is to suggest that what distinguishes sapient/”generally intelligent” entities is not that they’re the only entities whose experiences matter, but that they have the ability to (learn to) have experiences of ~~arbitrarily high~~ tiers. And indeed: the whole shtick of “general intelligence” is that it should allow you to learn and reason about arbitrarily complicated systems of abstraction/multi-level organization. If the importance tiers of experiences really have something to do with the richness of the organization of the entity’s inner life, this resolves things neatly. Now:
- ~~Non-sapient entities may have experiences of nonzero importance.~~
- ~~No combination of non-sapient experiences can compare to the importance of a sapient entity’s life.~~
- ~~“Is sapient” tops the hierarchy of moral relevance: there’s no type of entity that is fundamentally “above”.~~
1. ^
  ~~Two caveats here are butterfly effects and emergent importance:~~
  Getting a dust speck at the wrong moment might kill you (if you’re operating dangerous machinery) or change the trajectory of your life (if this minor inconvenience is the last straw that triggers a career-ruining breakdown). We have to assume such possibilities away: the experiences exist “in a vacuum”. Doing otherwise would violate the experimental setup, dragging in various practical considerations, instead of making it purely about ethics.
  ~~So we assume that each dust-speck event always has the “median” amount of impact on a person’s life, even if you scale the amount of dust-speck events arbitrarily.~~
  Getting 1000 dust specks one after another adds up to something more than 1000 single-dustspeck experiences; it’s worse than getting a dust speck once per day for 1000 days. More intuitively, experiencing a ¹⁰⁄₁₀ pain for one millisecond is not comparable to experiencing ¹⁰⁄₁₀ pain for 10 minutes. There are emergent effects at play, and like with butterflies, we must assume them away for experimental purity.
  ~~So if we’re talking about~~ $M$ ~~experiences from within the same importance tier, they’re assumed to be distributed such that they don’t add up to a higher-tier experience.~~
  ~~Note that those are very artificial conditions. In real life, both of those are~~ ~~very much in play~~. Any lower-tier experience has a chance of resulting into a higher-tier experience, and every higher-tier experience emerges from (appropriately distributed) lower-tier experiences. In our artificial setup, we’re assuming certain knowledge that no butterfly effects would occur, and that a lower-tier event contributes to no higher-tier pattern.
  Relevance: There’s reasoning that goes, “if you ever drive to the store to get a chocolate bar, you’re risking crashing into and killing someone, therefore you don’t value people’s lives infinitely more than eating chocolate”. I reject it on the above grounds. Systematically avoiding all situations where you’re risking someone’s life in exchange for a low-importance experience would assemble into a high-importance life-ruining experience for you (starving to death in your apartment, I guess?). Given that, we’re now comparing same-tier experiences, and ~~here~~ ~~I’m willing to be additive, calculating that killing a person with very low probability is better than killing yourself (by a thousand cuts) with certainty.~~