Mateusz Bagiński

Karma: 1,602

Agent foundations, AI macrostrategy, human enhancement.

I endorse and operate by Crocker’s rules.

I have not signed any agreements whose existence I cannot mention.

Mateusz Bagiński 18 Jun 2025 20:28 UTC
2 points
0
in reply to: tailcalled’s comment on: tailcalled’s Shortform
I’ve come to believe that sin interferes with consciousness because that applies to all the sins I’ve been able to think of (e.g. murder).
How come?

Mateusz Bagiński 12 Jun 2025 16:41 UTC
3 points
1
in reply to: LWLW’s comment on: Eli’s shortform feed
I know that you didn’t mean it as a serious comment, but I’m nevertheless curious about what you meant by “the universe is a teleology”.

Mateusz Bagiński 12 Jun 2025 15:06 UTC
5 points
3
in reply to: Eli Tyre’s comment on: Eli’s shortform feed
I would appreciate it if you put probabilities on at least some of these propositions.

Mateusz Bagiński 12 Jun 2025 8:09 UTC
9 points
12
in reply to: Eli Tyre’s comment on: Eli’s shortform feed
At this point something even stranger happens.
What this “something even stranger” is seems rather critical.

Mateusz Bagiński 5 Jun 2025 14:18 UTC
4 points
0
on: Logical Correlation
I think that if you want to compute logical correlations between programs, you need to look at their, well, logic. E.g., if you have some way of extracting a natural abstraction-based representation of their logic, build something like a causal graph from it, and then design a similarity metric for comparing these representations.
I have a suspicion, though, that this is not the right approach for handling ECL because ECL (I think?) involves the agent(s) looking at (some abstraction over) their “source code(s)” and then making a decision based on that. I expect that this ~reflection needs to be modeled explicitly.

Mateusz Bagiński 2 Jun 2025 20:21 UTC
4 points
0
on: Beware the Moral Homophone
My forefrontest thought as I was finishing this essay was “Applying this concept to AI risk is left as an exercise for the reader.”.
Then I thought that AI risk, if anything, is characterized by kinda the opposite dynamic: lots of groups with different risk models, not that rarely explicitly criticizing the strategy/approach of the others as net-negative or implicitly complacent with the baddies, finding it hard to cooperate despite what locally seems like convergent local subgoals. (To be clear: I’m not claiming that everybody’s take/perspective on this is valid or that everybody in this field should cooperate with everybody else or whatever.)
Then I thought that, actually, even within what seems like somewhat coherent factions, we would probably see some tails coming apart once their goal (AI moratorium, PoC aligned AGI, existential security, exiting the acute risk period) is achieved.
And then I thought, well...
- GDM, OpenAI, Anthropic, …
- Epoch and Mechanize
- … there’s probably more examples in the past
And then there were conversations where people I viewed as ~allied turned out to bite bullets that I considered (and still consider) equivalent to moral atrocities.
I may want to think more about this but ATM it seems to me like AI risk as a field (or loose cluster of groups) is failing both on cooperating to achieve locally cooperation-worthy convergent subgoals and also failing on seeing past the moral homophones.
(When I say “failing”, I’m inclined to ask myself what standard I should apply but reality doesn’t grade on a curve and stakes are huge.)
---
Anyway, thanks for the post and the concept!

Mateusz Bagiński 29 May 2025 12:46 UTC
8 points
7
in reply to: acertain’s comment on: johnswentworth’s Shortform
Which doesn’t make the OP wrong.

Mateusz Bagiński 28 May 2025 20:43 UTC
5 points
0
in reply to: Mo Putera’s comment on: Mo Putera’s Shortform
I feel like we perhaps need to reach some “escape velocity” to get something like that going, but for ~rationality / deliberately figuring out how to think and act better.

Mateusz Bagiński 28 May 2025 9:57 UTC
2 points
0
on: Basic Inframeasure Theory
Proposition 8: If $f \in C (X, [0, 1])$ and $g : X \to Y$ is continuous, then $E_{g_{*} (H)} (f) = E_{H} (f \circ g)$
I’m pretty sure this should be $f \in C (Y, [0, 1])$ because otherwise the types don’t match.

Mateusz Bagiński 27 May 2025 20:43 UTC
2 points
0
in reply to: tailcalled’s comment on: tailcalled’s Shortform
Reading this made me think that the framing “Everything is alignment-constrained, nothing is capabilities-constrained.” is a rathering and that a more natural/joint-carving framing is:
To the extent that you can get capabilities by your own means (rather than hoping for reality to give you access to a new pool of some resource or whatever), you get them by getting various things to align so that they produce those capabilities.

Mateusz Bagiński 27 May 2025 20:03 UTC
4 points
0
in reply to: Martín Soto’s comment on: Martín Soto’s Shortform
Generating new frontier knowledge: As in, given a LW post generating interesting comments that add to the conversation, or given some notes on a research topic generating experiment ideas, etc.
Have you tested it on sites/forums other than LW?

Mateusz Bagiński 27 May 2025 13:56 UTC
2 points
0
in reply to: Donald Hobson’s comment on: What’s So Bad About Ad-Hoc Mathematical Definitions?
Pearson Correlation ⇒ Actual info ⇒ Shannon mutual info.
Shouldn’t it be the other way around?
Pearson Correlation ≤ Actual info ≤ Shannon mutual info.

Mateusz Bagiński 25 May 2025 21:12 UTC
5 points
4
in reply to: tailcalled’s comment on: tailcalled’s Shortform
Can you give some reasons why you think all that or some of all that?

Mateusz Bagiński 19 May 2025 12:41 UTC
4 points
0
in reply to: Towards_Keeperhood’s comment on: Bounded AI might be viable
The short answer to “How is it different from corrigibility?” is something like: here we’re thinking about systems that are not sufficiently powerful for us to need them to be fully corrigible.
This sounds to me like you’re imagining just nobody building a more powerful AIs is an option if we already got a lot of value from it (where I don’t really know what level of capability you imagine concretely)? If the world was so reasonable we wouldn’t rush ahead with our abysmal understanding of AI anyways because obviously the risks outweigh the benefits? Also you don’t just need to convince the leading labs because progress will continue and soon enough many many actors will be able to create unaligned powerful AI, and someone will.
The (revealed) perception of risks and benefits depends on many things, including what kind of AI is available/widespread/adopted. Perhaps we can tweak those parameters. (Not claiming that it’s going to be easy.)
I think the right framing of the bounded/corrigible agent agenda is aiming toward a pivotal act.
Something in this direction, yes.

Mateusz Bagiński 19 May 2025 8:47 UTC
2 points
0
in reply to: dr_s’s comment on: TsviBT’s Shortform
Perhaps you misread the OP as saying “small molecules” rather than “small set of molecules”.

Mateusz Bagiński 17 May 2025 21:15 UTC
7 points
0
in reply to: Raemon’s comment on: Orienting Toward Wizard Power
“Maker Power”?

Mateusz Bagiński 16 May 2025 21:37 UTC
3 points
1
in reply to: Eric Neyman’s comment on: David Matolcsi’s Shortform
Do you think that it would be worth it to try to partially sort this out in a LW dialogue?

Mateusz Bagiński 16 May 2025 19:26 UTC
5 points
1
in reply to: David Matolcsi’s comment on: David Matolcsi’s Shortform
I was very skeptical from the beginning, for largely similar reasons I expressed in my posts. But first I told myself that I should stay a little longer.
IME, in the majority of cases, when I strongly felt like quitting but was also inclined to justify “staying just a little bit longer because XYZ”, and listened to my justifications, staying turned out to be the wrong decision.

Mateusz Bagiński 16 May 2025 14:51 UTC
2 points
0
in reply to: Gordon Seidoh Worley’s comment on: Is requires ought
I think: To the extent that one even ascribes “is” claims / beliefs to techne and gnosis, those “is”es are grounded in “ought”s (though it’s possible I’m now applying this way of thinking more broadly than the post).
There I think we can make a distinction between pure “is” that exists prior to conceptualization and “is from ought” arising only after such experiences are reified
I don’t think there’s any pure “is” that exists prior to conceptualization.

Mateusz Bagiński 14 May 2025 7:32 UTC
6 points
0
on: Working through a small tiling result
- $B o t (S u c c e s s o r (B o t)) \leftrightarrow □ g o o d (B o t) \leftrightarrow □ (\forall X . □ g o o d (X) \to g o o d (X))$
What are the X’s over which you quantified in the rightmost biconditional? Bots in chains that started with Bot?