momom2

Karma: 216

AIS student, self-proclaimed aspiring rationalist, very fond of game theory.
”The only good description is a self-referential description, just like this one.”

momom2 27 Jan 2025 22:11 UTC
1 point
0
on: AI Strategy Updates that You Should Make
Thanks for writing this! I was unaware of the Chinese investment, which explains another recent information which you did not include but I think is significant: Nvidia’s stock plummeted 18% today.

momom2 24 Jan 2025 9:57 UTC
1 point
−3
on: Tell me about yourself: LLMs are aware of their learned behaviors
Five minutes of thought on how this could be used for capabilities:
- Use behavioral self-awareness to improve training data (e.g. training on this dataset increases self-awareness of code insecurity, so it probably contains insecure code that can be fixed before training on it).
- Self-critique for iterative improvement within a scaffolding (already exists, but this work validates the underlying principles and may provide further grounding).
It sure feels like behavioral self-awareness should work just as well for self capability assessments as for safety topics, and that this ought to be usable to improve capabilities but my 5 minutes are up and I don’t feel particularly threatened by what I found.
In general, given concerns that safety-intended work often ends up boosting capabilities, I would appreciate systematically including a section on why the authors believe their work is unlikely to have negative externalities.

momom2 24 Jan 2025 9:34 UTC
3 points
0
on: Mechanisms too simple for humans to design
(If you take time to think about this, feel free to pause reading and write your best solution in the comments!)
How about:
- Allocating energy everywhere to either twitching randomly or collecting nutrients. Assuming you are propelled by the twitching, this follows the gradient if there’s one.
- Try to grow in all directions. If there are no outside nutrients to fuel this growth, consume yourself. In this manner, regenerate yourself in the direction of the gradient.
- Try to grab nutrients from all directions. If there are nutrients, by reaction you will be propelled towards it so this moves in the direction of the gradient.
Update after seeing the solution of B. subtilis: Looks like I had the wrong level of abstraction in mind. Also, I didn’t consider group solutions.

momom2 21 Jan 2025 22:19 UTC
5 points
0
on: The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating
Contra 2:
ASI might provide a strategic advantage of a kind which doesn’t negatively impact the losers of the race, e.g. it increases GDP by x10 and locks competitors out of having an ASI.
Then, losing control of the ASI could [not being able of] posing an existential risk to the US.
I think it’s quite likely this is what some policymakers have in mind: some sort of innovation which will make everything better for the country by providing a lot cheap labor and generally improving productivity, the way we see AI applications do right now but on a bigger scale.
Comment on 3:
Not sure who your target audience is; I assume it would be policymakers, in which case I’m not sure how much weight that kind of argument has? I’m not a US citizen, but from international news I got the impression that current US officials would rather relish the option to undermine the liberal democracy they purport to defend.

momom2 19 Jan 2025 15:18 UTC
1 point
0
in reply to: cousin_it’s comment on: On Eating the Sun
From the disagreement between the two of you, I infer there is yet debate as to what environmentalism means. The only way to be a true environmentalist then is to make things as reversible as possible until such time as an ASI can explain what the environmentalist course of action regarding the Sun should be.

momom2 17 Jan 2025 20:55 UTC
1 point
0
on: The Absent-Minded Driver
The paradox arises because the action-optimal formula mixes world states and belief states.
The [action-planning] formula essentially starts by summing up the contributions of the individual nodes as if you were an “outside” observer that knows where you are, but then calculates the probabilities at the nodes as if you were an absent-minded “inside” observer that merely believes to be there (to a degree).

So the probabilities you’re summing up are apples and oranges, so no wonder the result doesn’t make any sense. As stated, the formula for action-optimal planning is a bit like looking into your wallet more often, and then observing the exact same money more often. Seeing the same 10 dollars twice isn’t the same thing as owning 20 dollars.

If you want to calculate the utility and optimal decision probability entirely in belief-space (i.e. action-optimal), then you need to take into account that you can be at X, and already know that you’ll consider being at X again when you’re at Y.

So in belief space, your formula for the expected value also needs to take into account that you’ll forget, and the formula becomes recursive. So the formula should actually be:
$E = α p \times E + α (1 - p) \times 0 + (1 - α) p \times 1 + (1 - α) (1 - p) \times 4$
Explanation of the terms in order of appearance:
- If we are in X and CONTINUE, then we will “expect the same value again” when we are in Y in the future. This enforces temporal consistency.
- If we are in X and EXIT, then we should expect 0 utility
- If we are in Y and CONTINUE, then we should expect 1 utility
- If we are in Y and EXIT, then we should expect 4 utility We also know that a must be 1 / (1 + p), because when driving n times, you’re in X for n times, and in Y for p * n times.
Under that constraint, we get that $E = - 3 p^{2} + 4 p$ The optimum here is at p=2/3 with an expected utility of ⁴⁄₃, which matches the planning-optimal formula.

[Shamelessly copied from a comment under this video by xil12323.]

momom2 5 Jan 2025 21:37 UTC
2 points
1
in reply to: Eli Tyre’s comment on: Review: Planecrash
Having read Planecrash, I do not think there is anything in this review that I would not have wanted to know before reading the work (which is the important part of what people consider “spoilers” for me).

momom2 25 Nov 2024 12:54 UTC
1 point
0
in reply to: abramdemski’s comment on: Which things were you surprised to learn are not metaphors?
Top of the head like when I’m trying to frown too hard

momom2 4 Nov 2024 16:56 UTC
1 point
0
on: Do We Believe Everything We’re Told?
distraction had no effect on identifying true propositions (55% success for uninterrupted presentations, vs. 58% when interrupted); but did affect identifying false propositions (55% success when uninterrupted, vs. 35% when interrupted)
If you are confused by these numbers (why so close to 50%? Why below 50%) it’s because participants could pick four options (corresponding to true, false, don’t know and never seen).
You can read the study, search for keyword “The Identification Test”.

momom2 2 Nov 2024 22:03 UTC
1 point
0
in reply to: nim’s comment on: Two arguments against longtermist thought experiments
1. I don’t see what you mean by the grandfather problem.
  1. I don’t care about the specifics of who spawns the far future generation; whether it’s Alice or Bob I am only considering numbers here.
  2. Saving lives now has consequences for the far future insofar as current people are irrepleceable: if they die, no one will make more children to compensate, resulting in a lower total far future population. Some deaths are less impactful than others for the far future.
2. That’s an interesting way to think about it, but I’m not convinced; killing half the population does not reduce the chance of survival of humanity by half.
  1. In terms of individuals, only the last <.1% matter (not sure about the order of magnitude, but in any case it’s small as a proportion of the total).
  2. It’s probably more useful to think in terms of events (nuclear war, misaligned ASI → prevent war, research alignment) or unsurvivable conditions (radiation, killer robots → build bunker, have kill switch) that can prevent humanity from recovering from a catastrophe.

momom2 2 Nov 2024 21:47 UTC
3 points
0
in reply to: AnthonyC’s comment on: Two arguments against longtermist thought experiments
Yes, that’s the first thing that was talked about in my group’s discussion on longtermism. For the sake of the argument, we were asked to assume that the waste processing/burial choice amounted to a trade in lives all things considered… but the fact that any realistic scenario resembling this thought experiment would not be framed like that is the central part of my first counterargument.

momom2 27 Oct 2024 23:22 UTC
3 points
0
on: Why is there Nothing rather than Something?
I enjoy reading any kind of cogent fiction on LW, but this one is a bit too undeveloped for my tastes. Perhaps be more explicit about what Myrkina sees in the discussion which relates to our world?
You don’t have to always spell earth-shattering revelations out loud (in fact it’s best to let the readers reach the correct conclusion by themselves imo), but there needs to be enough narrative tension to make the conclusion inevitable; as it stands, it feels like I can just meh my way out of thinking more than 30s on what the revelation might be, the same way Tralith does.

momom2 21 Oct 2024 8:41 UTC
3 points
0
in reply to: Steven Byrnes’s comment on: Against empathy-by-default
Thanks, it does clarify, both on separating the instantiation of an empathy mechanism in the human brain vs in AI and on considering instantiation separately from the (evolutionary or training) process that leads to it.

momom2 20 Oct 2024 20:26 UTC
3 points
0
on: Against empathy-by-default
I was under the impression that empathy explained by evolutionary psychology as a result of the need to cooperate with the fact that we already had all the apparatus to simulate other people (like Jan Kulveit’s first proposition).
(This does not translate to machine empathy as far as I can tell.)

I notice that this impression is justified by basically nothing besides “everything is evolutionary psychology”. Seeing that other people’s intuitions about the topic are completely different is humbling; I guess emotions are not obvious.

So, I would appreciate if you could point out where the literature stands on the position you argue against, Jan Kulveit’s or mine (or possibly something else).
Are all these takes just, like, our opinion, man, or is there strong supportive evidence for a comprehensive theory of empathy (or is there evidence for multiple competing theories)?

momom2 14 Oct 2024 15:22 UTC
15 points
12
on: Why Stop AI is barricading OpenAI
I do not find this post reassuring about your approach.
- Your plan is unsound; instead of a succession of events which need to go your way, I think you should aim for incremental marginal gains. There is no cost-effectiveness analysis, and the implicit theory of change is lacunar.
- Your press release is unreadable (poor formatting), and sounds like a conspiracy theory (catchy punchlines, ALL CAPS DEMANDS, alarmist vocabulary and unsubstantiated claims) ; I think it’s likely to discredit safety movements and raise attention in counterproductive ways.
- The figures you quote are false (the median from AI Impacts is 5%) or knowingly misleading (the numbers from Existential risk from AI survey are far from robust and as you note, suffer from selection bias), so I think it’s fair to call them lies.
- Your explanations for what you say in the press release sometimes don’t make sense! You conflate AGI and self-modifying systems, your explanation for “eventually” does not match the sentence.
- Your arguments are based on wrong premises—it’s easy to check that your facts such as “they are not following the scientific method” are plain wrong. It sounds like you’re trying to smear OpenAI and Sam Altman as much as possible without consideration for whether what you’re saying is true.
I am appalled to see this was not downvoted into oblivion! My best guess is that people feel that there are not enough efforts going towards stopping AI and did not read the post and the press release to check that you have good reason motivating your actions.

momom2 25 Sep 2024 13:50 UTC
3 points
0
on: When to join a respectability cascade
I agree with the broad idea, but I’m going to need a better implementation.
In particular, the 5 criteria you give are insufficient because the example you give scores well on them, and is still atrocious: if we decreed that “black people” was unacceptable and should be replaced by “black peoples”, it would cause a lot of confusion on account of how similar the two terms are and how ineffective the change is.

The cascade happens because of a specific reason, and the change aims at resolving that reason. For example, “Jap” is used as a slur, and not saying it shows you don’t mean to use a slur. For black people/s, I guess the reason would be something like not implying that there is a single black people, which only makes sense in the context of a specialized discussion.

I can’t adhere to the criteria you proposed because they don’t work, and I don’t want to bother thinking that deep about every change of term on an everyday basis, so I’ll keep on using intuition to choose when to solve respectability cascades for now.
For deciding when to trigger a respectability cascade, your criteria are interesting for having any sort of principled approach, but I’m still not sure they outperform unconstrained discussion on the subject (which I assume is the default alternative for anyone who cares enough about deliberately triggering respectability cascades to have read your post in the first place).

momom2 25 Sep 2024 13:12 UTC
3 points
1
on: Why the 2024 election matters, the AI risk case for Harris, & what you can do to help
- Probability of existential catastrophe before 2032 assuming AGI arrives in that period and Harris wins^[12] = 30%
- Probability of existential catastrophe before 2032 assuming AGI arrives in that period and Trump wins^[13] = 35%.
A lot of your AI-risk reason to support Harris seems to hinge on this, which I find very shaky. How wide are your confidence intervals here?
My own guesses are much more fuzzy. According to your argument, if my intuition was .2 vs .5, then it’s an overwhelming case for Harris but I’m unfamiliar enough with the topic that it could easily be the reverse.

I would greatly appreciate more details on how you reach your numbers (and if they’re vibes, reason whether to trust those vibes).
Alternatively, I feel like I should somehow discount the strength of the AI-risk reason based on how likely I think these numbers are to more or less hold true, but I don’t know a principled way to do it.

momom2 18 Sep 2024 11:29 UTC
2 points
1
in reply to: Doug_S.’s comment on: No One Can Exempt You From Rationality’s Laws
Seems like you need to go beyond arguments of authority and stating your conclusions and instead go down to the object-level disagreements. You could say instead “Your argument for ~X is invalid because blah blah” and if Jacob says “Your argument for the invalidity of my argument for ~X is invalid because blah blah” then it’s better than before because it’s easier to evaluate argument validity than ground truth.
(And if that process continues ad infinitam, consider that someone who cannot evaluate the validity of the simplest arguments is not worth arguing with.)

momom2 18 Sep 2024 10:43 UTC
3 points
0
in reply to: Darmani’s comment on: The Mountain Troll
It’s thought-provoking.
Many people here identify as Bayesians, but are as confused as Saundra by the troll’s questions, which indicates that they’re missing something important.

momom2 12 Sep 2024 21:34 UTC
1 point
0
in reply to: dov’s comment on: No Safe Defense, Not Even Science
It wasn’t mine. I did grow up in a religious family, but becoming a rationalist came gradually, without sharp divide with my social network. I always figured people around me were making all sorts of logical mistakes though, and noticed very early deep flaws in what I was taught.