Mateusz Bagiński

Karma: 1,412

Agent foundations, AI macrostrategy, human enhancement.

I endorse and operate by Crocker’s rules.

I have not signed any agreements whose existence I cannot mention.

Mateusz Bagiński Mar 30, 2025, 10:05 AM
2 points
0
on: Delicious Boy Slop—Boring Diet, Effortless Weightloss
I’m curious about your exercise regimen.

Mateusz Bagiński Mar 29, 2025, 1:53 PM
5 points
0
in reply to: Joseph Miller’s comment on: Tracing the Thoughts of a Large Language Model
DeepMind says boo SAEs, now Anthropic says yay SAEs!
The most straightforward synthesis^[1] of these two reports is that SAEs find some sensible decomposition of the model’s internals into computational elements (concepts, features, etc.), which circuits then operate on. It’s just that these computational elements don’t align with human thinking as nicely as humans would like. E.g. SAE-based concept probes don’t work well OOD because the models were not optimized to have concepts that generalize OOD. This is perfectly consistent with linear probes being able to detect the concept from model activations (the model retains enough information about the concept such as “harmful intent” for the probe to latch onto it, even if the concept itself (or rather, its OOD-generalizing version) is not priviledged in the model’s ontology).
ETA: I think this would (weakly?) predict that SAE generalization failures should align with model performance dropping on some tasks. Or at least that the model would need to have some other features that get engaged OOD so that the performance doesn’t drop? Investigating this is not my priority, but I’d be curious to know if something like this is the case.
1. ^
  not to say that I’m believing it’s strongly; it’s just a tentative/provisional synthesis/conclusion

Mateusz Bagiński Mar 28, 2025, 8:58 PM
15 points
2
in reply to: Gordon Seidoh Worley’s comment on: Conceptual Rounding Errors
So… there surely are things like (overlapping, likely non-exhaustive):
- Memetic Darwinian anarchy—concepts proliferating without control, trying to carve out for themselves new niches in the noosphere or grab parts of real estate belonging to incumbent concepts.
- Memetic warfare—individuals, groups, egregores, trying to control the narrative by describing the same thing in the language of your own ideology, yadda yadda.
- Independent invention of the same idea—in which case it’s usually given different names (but also, plausibly, since some people may grow attached to their concepts of choice, they might latch onto trivial/superficial differences and amplify that, so that one or more instances of this multiply independently invented concept now is now morphed into something else than what it “should be”).
- Memetic rent seeking—because introducing a new catchy concept might marginally bump up your h-index.
So, as usual, the law of equal and opposite advice applies.
Still, the thing Jan describes is real and often a big problem.
I also think I somewhat disagree with this:
An idea should either be precisely defined enough that it’s clear why it can’t be rounded off (once the precise definition is known), or it’s a vague idea and it either needs to become more precise to avoid being rounded or it is inherently vague and being vague there can’t be much harm from rounding because it already wasn’t clear where its boundaries were in concept space.
Meanings are often subtle, intuited but not fully grasped, in which case a (premature) attempt to explicitize them risks collapsing their reference to the important thing they are pointing at. Many important concepts are not precisely defined. Many are best sorta-defined ostensively: “examples of X include A, B, C, D, and E; I’m not sure what it makes all of them instances of X, maybe it’s that they share the properties Y and Z … or at least my best guess is that Y and Z are important parts of X and I’m pretty sure that X is a Thing™”.
Eliezer has a post (I couldn’t find it at the moment) where he noticed that the probabilities he gave were inconsistent. He asks something like, “Would I really not behave as if God existed if I believed that P(Christianity)=1e-5?” and then, “Oh well, too bad, but I don’t know which way to fix it, and fixing it either way risks losing important information, so I’m deciding to live with this lack of consistency for now.”

Mateusz Bagiński Mar 28, 2025, 9:02 AM
2 points
0
on: Tiling agents theory
This Google search is empty (and it’s also empty on the original Arbital page, so it’s not a porting issue).

Mateusz Bagiński Mar 26, 2025, 5:02 PM
2 points
0
in reply to: Raphael Roche’s comment on: The Dangers of Mirrored Life
LUCA lived around 4 billion years ago with some chirality chosen at random.
Not necessarily: https://en.wikipedia.org/wiki/Homochirality#Deterministic_theories
E.g.
Deterministic mechanisms for the production of non-racemic mixtures from racemic starting materials include: asymmetric physical laws, such as the electroweak interaction (via cosmic rays) or asymmetric environments, such as those caused by circularly polarized light, quartz crystals, or the Earth’s rotation, β-Radiolysis or the magnetochiral effect. The most accepted universal deterministic theory is the electroweak interaction. Once established, chirality would be selected for.

Mateusz Bagiński Mar 26, 2025, 11:10 AM
4 points
2
in reply to: momom2’s comment on: Map of all 40 copyright suits v. AI in U.S.
Especially given how concentrated-sparse it is.
It would be much better to have it as a google sheet.

Mateusz Bagiński Mar 26, 2025, 8:46 AM
5 points
0
in reply to: ryan_greenblatt’s comment on: Recent AI model progress feels mostly like bullshit
How long do you^[1] expect it to take to engineer scaffolding that will make reasoning models useful for the kind of stuff described in the OP?
1. ^
  You=Ryan firstmost but anybody reading this secondmost.

Mateusz Bagiński Mar 25, 2025, 4:04 PM
2 points
0
in reply to: Thomas Kwa’s comment on: Goodhart’s Law Causal Diagrams
https://www.lesswrong.com/posts/TYgztDNXhobbqMpXh/goodhart-typology-via-structure-function-and-randomness

Goodhart Typology via Structure, Function, and Randomness Distributions

JustinShovelain and Mateusz Bagiński

Mar 25, 2025, 4:01 PM

32 points

0 comments15 min readLW link

Mateusz Bagiński Mar 25, 2025, 7:13 AM
2 points
0
in reply to: Garrett Baker’s comment on: rhollerith_dot_com’s Shortform
My model is that
1. some of it is politically/ideologically/self-interest-motivated
2. some of it is just people glancing at a thing, forming an impression, and not caring to investigate further
3. some of it is people interacting with the thing indirectly via people from the first two categories; some subset of them then take a glance at the PauseAI website or whatever, out of curiosity, form an impression (e.g. whether it matches what they’ve heard from other people), don’t care to investigate further
Making slogans more ~precise might help with (2) and (3)

Mateusz Bagiński Mar 24, 2025, 8:46 PM
2 points
0
in reply to: Garrett Baker’s comment on: rhollerith_dot_com’s Shortform
Some people misinterpret/mispaint them(/us?) as “luddites” or “decels” or “anti-AI-in-general” or “anti-progress”.
Is it their(/our?) biggest problem, one of their(/our?) bottlenecks? Most likely no.
It might still make sense to make marginal changes that make it marginally harder to do that kind of mispainting / reduce misinterpretative degrees of freedom.

Mateusz Bagiński Mar 24, 2025, 7:22 PM
2 points
0
in reply to: Garrett Baker’s comment on: rhollerith_dot_com’s Shortform
You can still include it in your protest banner portfolio to decrease the fraction of people whose first impression is “these people are against AI in general” etc.

Mateusz Bagiński Mar 24, 2025, 12:06 PM
3 points
0
in reply to: JenniferRM’s comment on: Solving willpower seems easier than solving aging
This closely parallels the situation with the immune system.
One might think “I want a strong immune system. I want to be able to fight every dangerous pathogen I might encounter.”
You go to your local friendly genie and ask for a strong immune system.
The genie fulfills your wish. No more seasonal flu. You don’t need to bother with vaccines. You even considered stopping to wash your hands but then you realized that other people are still not immune to whatever bugs might there be on your skin.
Then, a few weeks in, you get an anaphylactic shot when eating your favorite peanut butter sandwich. An ambulance takes you to the hospital where they also tell you that you got Hashimoto.
You go to your genie to ask “WTF?” and the genie replies “You asked for a strong immune system, not a smart one. It was not my task to ensure that it knows that peanut protein is not the protein of some obscure worm even though they might look alike, or that the thyroid is a part of your own body.”.

Mateusz Bagiński Mar 24, 2025, 11:53 AM
2 points
0
in reply to: JenniferRM’s comment on: Solving willpower seems easier than solving aging
I have experimented some with meditation specifically with the goal of embracing the DMN (with few definite results)
I’d be curious to hear more details on what you’ve tried.

Mateusz Bagiński Mar 24, 2025, 11:46 AM
4 points
0
in reply to: Viliam’s comment on: Solving willpower seems easier than solving aging
Relevant previous discussion: https://www.lesswrong.com/posts/XYYyzgyuRH5rFN64K/what-makes-people-intellectually-active

Mateusz Bagiński Mar 24, 2025, 11:41 AM
2 points
0
in reply to: Viliam’s comment on: Solving willpower seems easier than solving aging
Then the effect would be restricted to people who are trying to control their eating which we would probably have heard of by now.

Mateusz Bagiński Mar 24, 2025, 8:55 AM
3 points
0
on: Why Were We Wrong About China and AI? A Case Study in Failed Rationality
What is some moderately strong evidence that China (by which I mean Chinese AI labs and/or the CCP) is trying to build AGI, rather than “just”: build AI that is useful for whatever they want their AIs to do and not fall behind the West while also not taking the Western claims about AGI/ASI/singularity at face value?

Mateusz Bagiński Mar 24, 2025, 8:40 AM
3 points
0
in reply to: Dana’s comment on: Why Were We Wrong About China and AI? A Case Study in Failed Rationality
DeepSeek from my perspective should incentivize slowing down development (if you agree with the fast follower dynamic. Also by reducing profit margins generally), and I believe it has.
Any evidence of DeepSeek marginally slowing down AI development?

Mateusz Bagiński Mar 23, 2025, 7:14 PM
3 points
0
on: Metacognition Broke My Nail-Biting Habit
There’s a psychotherapy school called “metacognitive therapy” and some people swear by it being simple and a solution to >50% of psychological problems because it targets the root causes of psychological problems (saying from memory of what was in the podcast that I listened to in the Summer of 2023 and failed to research the topic further; so my description might be off but maybe somebody will find some value in it).
https://podcast.clearerthinking.org/episode/173/pia-callesen-using-metacognitive-therapy-to-break-the-habit-of-rumination/

Mateusz Bagiński Mar 23, 2025, 7:06 PM
2 points
0
on: Dusty Hands and Geo-arbitrage
In the case of engineering humans for increased IQ, Indians show broad support for such technology in surveys (even in the form of rather extreme intelligence enhancement), so one might focus on doing research there and/or lobbying its people and government to fund such research. High-impact Indian citizens interested in this topic seem like very good candidates for funding, especially those with the potential of snowballing internal funding sources that will be insulated from western media bullying.
I’ve also heard that AI X-risk is much more viral in India than EA in general (in comparative terms, relative to the West).
And in terms of “Anything right-leaning” a parallel EA culture, preferably with a different name, able to cultivate right-wing funding sources might be effective.
Progress studies? Not that they are necessarily right-leaning themselves but if you integrate support for [progress-in-general and doing a science of it] over the intervals of the political spectrum, you might find that center-right-and-righter supports it more than center-left-and-lefter (though low confidence and it might flip if you ignore the degrowth crowd).

Mateusz Bagiński

Good­hart Ty­pol­ogy via Struc­ture, Func­tion, and Ran­dom­ness Distributions

Goodhart Typology via Structure, Function, and Randomness Distributions