orthonormal

Karma: 17,817

Lighthaven Sequences Reading Group #29 (Tuesday 04/08)

Jozdien, Aella, Ronny Fernandez, Ben Pace, Garrett Baker, Jozdien and orthonormal

Apr 4, 2025, 1:16 AM

7 points

0 comments2 min readLW link

orthonormal Apr 1, 2025, 6:48 PM
LW: 2 AF: 1
0
AF
on: orthonormal’s Shortform
[EDIT: Never mind, this is just Kleene’s second recursion theorem!]
Quick question about Kleene’s recursion theorem:
Let’s say F is a computable function from ℕ^N to ℕ. Is there a single computable function X from ℕ^N to ℕ such that
X = F(X, y_2,..., y_N) for all y_2,...,y_N in ℕ
(taking the X within F as the binary code of X in a fixed encoding) or do there need to be additional conditions?

orthonormal Mar 25, 2025, 5:04 AM
LW: 2 AF: 1
0
AF
in reply to: orthonormal’s comment on: orthonormal’s Shortform
My current candidate definitions, with some significant issues in the footnotes:
A fair environment is a probabilistic function $F (x_{1}, . . ., x_{N}) = [X_{1}, . . ., X_{N}]$ from an array of actions to an array of payoffs.
An agent $A$ is a random variable
$A (F, A_{1}, . . ., A_{i - 1}, A_{i} = A, A_{i + 1}, . . ., A_{N})$
which takes in a fair environment $F$ ^[1] and a list of agents (including itself), and outputs a mixed strategy over its available actions in $F$ . ^[2]
A fair agent is one whose mixed strategy is a function of subjective probabilities^[3] that it assigns to [the actions of some finite collection of agents in fair environments, where any agents not appearing in the original problem must themselves be fair].
Formally, if $A$ is a fair agent in with a subjective probability estimator $P$ , $A$ ’s mixed strategy in a fair environment $F$ ,
$A (F, A_{1}, . . ., A_{i - 1}, A_{i} = A, A_{i + 1}, . . ., A_{N})$
should depend only on a finite collection of $A$ ’s subjective probabilities about outcomes
${P (F_{k} (A_{1}, . . ., A_{N}, B_{1}, . . . B_{M})) = [X_{1}, . . ., X_{N + M}]}_{k = 1}^{K}$
for a set of fair environments $F_{1}, . . ., F_{K}$ and an additional set of fair^[4] agents^[5] $B_{1}, . . ., B_{M}$ if needed (note that not all agents need to appear in all environments).
A fair problem is a fair environment with one designated player, where all other agents are fair agents.
1. ^
  I might need to require every $F$ to have a default action $d_{F}$ , so that I don’t need to worry about axiom-of-choice issues when defining an agent over the space of all fair environments.
2. ^
  I specified a probabilistic environment and mixed strategies because I think there should be a unique fixed point for agents, such that this is well-defined for any fair environment $F$ . (By analogy to reflective oracles.) But I might be wrong, or I might need further restrictions on $F$ .
3. ^
  Grossly underspecified. What kinds of properties are required for subjective probabilities here? You can obviously cheat by writing BlueEyedBot into your probability estimator.
4. ^
  This is an infinite recursion, of course. It works if we require each $B_{m}$ to have a strictly lower complexity in some sense than $A$ (e.g. the rank of an agent is the largest number $K$ of environments it can reason about when making any decision, and each $B_{m}$ needs to be lower-rank than $A$ ), but I worry that’s too strong of a restriction and would exclude some well-definable and interesting agents.
5. ^
  Does the fairness requirement on the $B_{m}$ suffice to avert the MetaBlueEyedBot problem in general? I’m unsure.

orthonormal Mar 25, 2025, 2:06 AM
LW: 6 AF: 5
2
AF
on: orthonormal’s Shortform
How do you formalize the definition of a decision-theoretically fair problem, even when abstracting away the definition of an agent as well as embedded agency?
I’ve failed to find anything in our literature.
It’s simple to define a fair environment, given those abstractions: a function E from an array of actions to an array of payoffs, with no reference to any other details of the non-embedded agents that took those actions and received those payoffs.
However, fair problems are more than just fair environments: we want a definition of a fair problem (and fair agents) under which, among other things:
- The classic Newcomb’s Problem against Omega, with certainty or with 1% random noise: fair
- Omega puts $1M in the box iff it predicts that the player consciously endorses one-boxing, regardless of what it predicts the player will actually do (e.g. misunderstand the instructions and take a different action than they endorsed): unfair
- Prisoner’s Dilemma between two agents who base their actions on not only each others’ predicted actions in the current environment, but also their predicted actions in other defined-as-fair dilemmas: fair
  - For example, PrudentBot will cooperate with you if it deduces that you will cooperate with it and also that you would defect against DefectBot, because it wants to exploit CooperateBots).
- Prisoner’s Dilemma between two agents who base their actions on each others’ predicted actions in defined-as-unfair dilemmas: unfair
  - It would let us smuggle in unfairness from other dilemmas; e.g. if BlueEyedBot only tries Löbian cooperation against agents with blue eyes, and MetaBlueEyedBot only tries Löbian cooperation against agents that predictably cooperate with BlueEyedBot, then the Prisoner’s Dilemma against MetaBlueEyedBot should count as unfair.
Modal combat doesn’t need to worry about this, because all the agents in it are fair-by-construction.
Yeah, I know, it’s about a decade late to be asking this question.

orthonormal Feb 13, 2025, 1:19 AM
6 points
3
in reply to: Bitnotri’s comment on: The Paris AI Anti-Safety Summit
Over the past three years, as my timelines have shortened and my hopes for alignment or coordination have dwindled, I’ve switched over to consumption. I just make sure to keep a long runway, so that I could pivot if AGI progress is somehow halted or sputters out on its own or something.

orthonormal Dec 6, 2024, 6:57 AM
19 points
−4
on: Common misconceptions about OpenAI
The fault does not lie with Jacob, but wow, this post aged like an open bag of bread.

orthonormal Dec 3, 2024, 10:36 PM
8 points
3
on: Lighthaven Sequences Reading Group #13 (Tuesday 12/03)
I suggest a fourth default question for these reading groups:
How did this post age?

orthonormal Oct 14, 2024, 12:55 AM
5 points
1
on: Extensions and Intensions
Soon the two are lost in a maze of words defined in other words, the problem that Steven Harnad once described as trying to learn Chinese from a Chinese/Chinese dictionary.
Of course, it turned out that LLMs do this just fine, thank you.

orthonormal Oct 14, 2024, 12:46 AM
2 points
0
on: The Cluster Structure of Thingspace
intensional terms
Should probably link to Extensions and Intensions; not everyone reads these posts in order.

orthonormal Oct 7, 2024, 6:38 PM
4 points
0
in reply to: Careful_correction’s comment on: orthonormal’s Shortform
Mati described himself as a TPM since September 2023 (after being PM support since April 2022), and Andrei described himself as a Research Engineer from April 2023 to March 2024. Why do you believe either was not a FTE at the time?
And while failure to sign isn’t proof of lack of desire to sign, the two are heavily correlated—otherwise it would be incredibly unlikely for the small Superalignment team to have so many members who signed late or not at all.

orthonormal Oct 1, 2024, 4:14 AM
LW: 81 AF: 32
2
AF
on: orthonormal’s Shortform
With the sudden simultaneous exits of Mira Murati, Barret Zoph, and Bob McGrew, I thought I’d update my tally of the departures from OpenAI, collated with how quickly the ex-employee had signed the loyalty letter to Sam Altman last November.
The letter was leaked at 505 signatures, 667 signatures, and finally 702 signatures; in the end, it was reported that 737 of 770 employees signed. Since then, I’ve been able to verify 56 departures of people who were full-time employees (as far as I can tell, contractors were not allowed to sign, but all FTEs were).
I still think I’m missing some, so these are lower bounds (modulo any mistakes I’ve made).
Headline numbers:
- Attrition for the 505 OpenAI employees who signed before the letter was first leaked: at least ²⁴⁄₅₀₅ = 4.8%
- Attrition for the next 197 to sign (it was leaked again at 667 signatures, and one last time at 702): at least ¹³⁄₁₉₇ = 6.6%
- Attrition for the (reported) 68 who had not signed by the last leak: at least ¹⁹⁄₆₈ = 27.9%.
Reportedly, 737 out of the 770 signed in the end, and many of the Superalignment team chose not to sign at all.
Below are my current tallies of some notable subsets. Please comment with any corrections!
People from the Superalignment team who never signed as of the 702 leak (including some policy/governance people who seem to have been closely connected) and are now gone:
- Carroll Wainwright
- Collin Burns
- Cullen O’Keefe
- Daniel Kokotajlo
- Jan Leike (though he did separately Tweet that the board should resign)
- Jeffrey Wu
- Jonathan Uesato
- Leopold Aschenbrenner
- Mati Roy
- William Saunders
- Yuri Burda
People from the Superalignment team (and close collaborators) who did sign before the final leak but are now gone:
- Jan Hendrik Kirchner (signed between 668 and 702)
- Steven Bills (signed between 668 and 702)
- John Schulman (signed between 506 and 667)
- Sherry Lachman (signed between 506 and 667)
- Ilya Sutskever (signed by 505)
- Pavel Izmailov (signed by 505)
- Ryan Lowe (signed by 505)
- Todor Markov (signed by 505)
Others who didn’t sign as of the 702 leak (some of whom may have just been AFK for the wrong weekend, though I doubt that was true of Karpathy) and are now gone:
- Andrei Alexandru (Research Engineer)
- Andrej Karpathy (Co-Founder)
- Austin Wiseman (Finance/Accounting)
- Girish Sastry (Policy)
- Jay Joshi (Recruiting)
- Katarina Slama (Member of Technical Staff)
- Lucas Negritto (Member of Technical Staff, then Developer Community Ambassador)
- Zarina Stanik (Marketing)
Notable other ex-employees:
- Barrett Zoph (VP of Research, Post-Training; signed by 505)
- Bob McGrew (Chief Research Officer; signed by 505)
- Chris Clark (Head of Nonprofit and Strategic Initiatives; signed by 505)
- Diane Yoon (VP of People; signed by 505)
- Gretchen Krueger (Policy; signed by 505; posted a significant Twitter thread at the time she left)
- Mira Murati (CTO; signed by 505)

orthonormal Oct 1, 2024, 3:19 AM
4 points
0
in reply to: orthonormal’s comment on: orthonormal’s Shortform
EDIT: On reflection, I made this a full Shortform post.
With the sudden simultaneous exits of Mira Murati, Barret Zoph, and Bob McGrew, I thought I’d do a more thorough scan of the departures. I still think I’m missing some, so these are lower bounds (modulo any mistakes I’ve made).
Headline numbers:
- Attrition for the 505 OpenAI employees who signed before the letter was first leaked: at least ²⁴⁄₅₀₅ = 4.8%
- Attrition for the next 197 to sign (it was leaked again at 667 signatures, and one last time at 702): at least ¹³⁄₁₉₇ = 6.6%
- Attrition for the (reported) 68 who had not signed by the last leak: at least ¹⁹⁄₆₈ = 27.9%.
Reportedly, 737 out of the 770 signed in the end, and many of the Superalignment team chose not to sign at all.
Below are my current tallies of some notable subsets. Please comment with any corrections!
People from the Superalignment team who never signed as of the 702 leak (including some policy/governance people who seem to have been closely connected) and are now gone:
- Carroll Wainwright
- Collin Burns
- Cullen O’Keefe
- Daniel Kokotajlo
- Jan Leike (though he did separately Tweet that the board should resign)
- Jeffrey Wu
- Jonathan Uesato
- Leopold Aschenbrenner
- Mati Roy
- William Saunders
- Yuri Burda
People from the Superalignment team (and close collaborators) who did sign before the final leak but are now gone:
- Jan Hendrik Kirchner (signed between 668 and 702)
- Steven Bills (signed between 668 and 702)
- John Schulman (signed between 506 and 667)
- Sherry Lachman (signed between 506 and 667)
- Ilya Sutskever (signed by 505)
- Pavel Izmailov (signed by 505)
- Ryan Lowe (signed by 505)
- Todor Markov (signed by 505)
Others who didn’t sign as of the 702 leak (some of whom may have just been AFK for the wrong weekend, though I doubt that was true of Karpathy) and are now gone:
- Andrei Alexandru (Research Engineer)
- Andrej Karpathy (Co-Founder)
- Austin Wiseman (Finance/Accounting)
- Girish Sastry (Policy)
- Jay Joshi (Recruiting)
- Katarina Slama (Member of Technical Staff)
- Lucas Negritto (Member of Technical Staff, then Developer Community Ambassador)
- Zarina Stanik (Marketing)
Notable other ex-employees:
- Barrett Zoph (VP of Research, Post-Training; signed by 505)
- Bob McGrew (Chief Research Officer; signed by 505)
- Chris Clark (Head of Nonprofit and Strategic Initiatives; signed by 505)
- Diane Yoon (VP of People; signed by 505)
- Gretchen Krueger (Policy; signed by 505; posted a significant Twitter thread at the time)
- Mira Murati (CTO; signed by 505)

orthonormal Sep 28, 2024, 12:19 AM
3 points
0
on: Book Review: On the Edge: The Future
CDT agents respond well to threats
Might want to rephrase this as “CDT agents give in to threats”

orthonormal Sep 2, 2024, 1:19 AM
5 points
0
in reply to: Review Bot’s comment on: How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
This is weirdly meta.

orthonormal Aug 26, 2024, 8:41 PM
3 points
0
on: AI #77: A Few Upgrades
If families are worried about the cost of groceries, they should welcome this price discrimination. The AI will realize you are worried about costs. It will offer you prime discounts to win your business. It will know you are willing to switch brands to get discounts, and use this to balance inventory.
Then it will go out and charge other people more, because they can afford to pay. Indeed, this is highly progressive policy. The wealthier you are, the more you will pay for groceries. What’s not to love?
A problem is that this is not only a tax on indifference, but also a tax on innumeracy and on lack of leisure time. Those who don’t know how to properly comparison shop are likely to be less wealthy, not more; same with those who don’t have the spare time to go to more than one store.

orthonormal Aug 26, 2024, 4:42 AM
7 points
0
on: Monthly Roundup #21: August 2024
Re: experience machine, Past Me would have refused it and Present Me would take it. The difference is due to a major (and seemingly irreversible) deterioration in my wellbeing several years ago, but not only because that makes the real world less enjoyable.
Agency is another big reason to refuse the experience machine; if I think I can make a difference in the base-level world, I feel a moral responsibility towards it. But I experience significantly less agency now (and project less agency in the future), so that factor is diminished for me.
The main factor that’s still operative is epistemics: I would much rather my beliefs be accurate than be deceived about the world. But it’s hard for that to outweigh the unhappiness at this point.
So if a lot of people would choose the Experience Machine, that suggests they are some combination of unhappy, not confident in their agency, and not obsessed with their epistemics. (Which does, I think, operationalize your “something is very wrong”.)

orthonormal Aug 26, 2024, 4:05 AM
7 points
0
in reply to: Nathan Young’s comment on: How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
1. Thanks—I didn’t recall the content of Yglesias’ tweet, and I’d noped out of sorting through his long feed. I suspect Yglesias didn’t understand why the numbers were weird, though, and people who read his tweet were even less likely to get it. And most significantly, he tries to draw a conclusion from a spurious fact!
2. Allowing explicitly conditional markets with a different fee structure (ideally, all fees refunded on the counterfactual markets) could be an interesting public service on Manifold’s part.
3. The only part of my tone that worries me in retrospect is that I should have done more to indicate that you personally were trying to do a good thing, and I’m criticizing the deference to conditional markets rather than criticizing your actions. I’ll see if I can edit the post to improve on that axis.
4. I think we still differ on that. Even though the numbers for the main contenders were just a few points apart, there was massive jockeying to put certain candidates at the top end of that range, because relative position is what viewers noticed.

orthonormal Aug 9, 2024, 4:54 AM
14 points
10
in reply to: Conflux’s comment on: How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
I’m really impressed with your grace in writing this comment (as well as the one you wrote on the market itself), and it makes me feel better about Manifold’s public epistemics.

orthonormal Aug 6, 2024, 6:41 PM
4 points
0
in reply to: kave’s comment on: How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
Yes, and I gained some easy mana from such markets; but the market that got the most attention by far was the intrinsically flawed conditional market.

orthonormal Aug 6, 2024, 5:00 PM
8 points
4
in reply to: basil.halperin’s comment on: How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage
Real-money markets do have stronger incentives for sharps to scour for arbitrage, so the 1/1/26 market would have been more likely to be noticed before months had gone by.
However (depending on the fee structure for resolving N/A markets), real-money markets have even stronger incentives for sharps to stay away entirely from spurious conditional markets, since they’d be throwing away cash and not just Internet points. Never ever ever cite out-of-the-money conditional markets.

orthonormal

Lighthaven Se­quences Read­ing Group #29 (Tues­day 04/​08)

Lighthaven Sequences Reading Group #29 (Tuesday 04/08)