AlexMennen

Karma: 4,560

AlexMennen Jun 19, 2025, 5:53 AM
LW: 4 AF: 2
0
AF
on: the void
This post claims that Anthropic is embarrassingly far behind twitter AI psychologists at skills that are possibly critical to Anthropic’s mission. This suggests to me that Anthropic should be trying to recruit from the twitter AI psychologist circle.

AlexMennen Jun 19, 2025, 5:50 AM
6 points
0
in reply to: eggsyntax’s comment on: the void
In particular, your argument that putting material into the world about LLMs potentially becoming misaligned may cause problems—I agree that that’s true, but what’s the alternative? Never talking about risks from AI? That seems like it plausibly turns out worse.
I think this depends somewhat on the threat model. How scared are you of the character instantiated by the model vs the language model itself? If you’re primarily scared that the character would misbehave, and not worried about the language model misbehaving except insofar as it reifies a malign character, then maybe making the training data not give the model any reason to expect such a character to be malign would reduce the risk of this to negligible, and that sure would be easier if no one had ever thought of the idea that powerful AI could be dangerous. But if you’re also worried about the language model itself misbehaving, independently of whether it predicts that its assigned character would misbehave (for instance, the classic example of turning the world into computronium that it can use to better predict the behavior of the character), then this doesn’t seem feasible to solve without talking about it, so the decrease in risk of model misbehavior from publically discussing AI risk is probably worth the increase in risk of the character misbehaving (which is probably easier to solve anyway) that it would cause.
I don’t understand outer vs inner alignment especially well, but I think this at least roughly tracks that distinction. If a model does a great job of instantiating a character like we told it to, and that character kills us, then the goal we gave it was catastrophic, and we failed at outer alignment. If the model, in the process of being trained on how to instantiate the character, also kills us for reasons other than that it predicts the character would do so, then the process we set up for achieving the given goal also ended up optimizing for something else undesirable, and we failed at inner alignment.

AlexMennen Jun 15, 2025, 6:00 PM
2 points
0
in reply to: Karl Krueger’s comment on: Against asking if AIs are conscious
It is useful for evolved mental machinery for enabling cooperation and conflict resolution to have features like what you describe, yes. I don’t agree that this points towards there being an underlying reality.

AlexMennen Jun 15, 2025, 5:48 PM
4 points
0
in reply to: cubefox’s comment on: Against asking if AIs are conscious
You can believe that what you do or did was unethical, which doesn’t need to have anything to do with conflict resolution.
It does relate to conflict resolution. Being motivated by ethics is useful for avoiding conflict, so it’s useful for people to be able to evaluate the ethics of their own hypothetical actions. But there are lots of considerations for people to take into account when chosing actions, so this does not mean that someone will never take actions that they concluded had the drawback of being unethical. Being able to reason about the ethics of actions you’ve already taken is additionally useful insofar as it correlates with how others are likely to see it, which can inform whether it is a good idea to hide information about your actions, be ready to try to make amends, defend yourself from retribution, etc.
Beliefs are not perceptions.
If there is some objective moral truth that common moral intuitions are heavily correlated with, there must be some mechanism by which they ended up correlated. Your reply to Karl makes it sound like you deny that anyone ever perceives anything other than perception itself, which isn’t how anyone else uses the word perceive.
It doesn’t mean that we are necessarily or fully motivated to be ethical.
Yes, but if no one was at all motivated by ethics, then ethical reasoning would not be useful for people to engage in, and no one would. The fact that ethics is a powerful force in society is central to why people bother studying it. This does not imply that everyone is motivated by ethics, or that anyone is fully motivated by ethics.

AlexMennen Jun 15, 2025, 4:40 PM
2 points
0
in reply to: cubefox’s comment on: Against asking if AIs are conscious
Regardless of whether the view Eliezer espouses here really counts as moral realism, as people have been arguing about, it does seem that it would claim that there is a fact of the matter about whether a given AI is a moral patient. So I appreciate your point regarding the implications for the LW Overton window. But for what it’s worth, I don’t think Eliezer succeeds at this, in the sense that I don’t think he makes a good case for it to be useful to talk about ethical questions that we don’t have firm views on as if they were factual questions, because:
1. Not everyone is familiar with the way Eliezer proposes to ground moral language, not everyone who is familiar with it will be aware that it is what any given person means when they use moral language, and some people who are aware that a given person uses moral language the way Eliezer proposes will object to them doing so. Thus using moral language in the way Eliezer proposes, whenever it’s doing any meaningful work, invites getting sidetracked on unproductive semantic discussions. (This is a pretty general-purpose objection to normative moral theories)
2. Eliezer’s characterization of the meaning of moral language relies on some assumptions about it being possible in theory for a human to eventually acquire all the relevent facts about any given moral question and form a coherent stance on it, and the stance that they eventually arrive at being robust to variations in the process by which they arrived at it. I think these assumptions are highly questionable, and shouldn’t be allowed to escape questioning by remaining implicit.
3. It offers no meaningful action guidence beyond “just think about it more”, which is reasonable, but a moral non-realist who aspires to acquire moral intuitions on a given topic would also think of that.
One could object to this line of criticism on the grounds that we should talk about what’s true independently of how it is useful to use words. But any attempt to appeal to objective truth about moral language runs into the fact that words mean what people use them to mean, and you can’t force people to use words the way you’d like them to. It looks like Eliezer kind of tries to address this by observing that extrapolated volation shares some features in common with the way people use moral language, which is true, and seems to conclude that it is the way people use moral language even if they don’t know it, which does not follow.

AlexMennen Jun 11, 2025, 3:30 PM
3 points
0
in reply to: silentbob’s comment on: Against asking if AIs are conscious
I agree that LessWrong comments are unlikely to resolve disagreements about moral realism. Much has been written on this topic, and I doubt I have anything new to say about it, which is why I didn’t think it would be useful to try to defend moral anti-realism in the post. I brought it up anyway because the argument in that paragraph crucially relies on moral anti-realism, I suspect many readers reject moral realism without having thought through the implications of that for AI moral patienthood, and I don’t in fact have much uncertainty about moral realism.
Regarding LessWrong consensus on this topic, I looked through a couple LessWrong surveys, and didn’t find any questions about this, so, this doesn’t prove much, but just out of curiosity, I asked Claude 4 Sonnet to predict the results of such a question, and here’s what it said (which seems like a reasonable guess to me):
*Accept moral realism**: ~8%
**Lean towards moral realism**: ~12%
**Not sure**: ~15%
**Lean against moral realism**: ~25%
**Reject moral realism**: ~40%

AlexMennen Jun 9, 2025, 5:56 PM
6 points
0
in reply to: mishka’s comment on: Against asking if AIs are conscious
If our experience of qualia reflect some poorly understood phenomenon in physics, it could be part of a cluster of related phenomena, not all of which manifest in human cognition. We don’t have as precise an understanding of qualia as we do of electrons; we just try to gesture at it, and we mostly figure out what each other is talking about. If some related phenomenon manifests in computers when they run large language models, which has some things in common with what we know as qualia but also some stark differences from any such phenomen manifesting in human brains, the things we have said about what we mean when we say “qualia” might not be sufficient to determine whether said phenomenon counts as qualia or not.

AlexMennen Jun 9, 2025, 5:34 PM
8 points
2
in reply to: silentbob’s comment on: Against asking if AIs are conscious
It undercuts the motivation for believing in moral realism, leaving us with no evidence for objective moral facts, which is a complicated thing, and thus unlikely to exist without evidence.

AlexMennen Jun 9, 2025, 4:29 PM
2 points
0
in reply to: TAG’s comment on: Against asking if AIs are conscious
I tried to address this sort of response in the original post. All of these more precise consciousness-related concepts share the commonality that they were developed using our perception of our own cognition and seeing evidence that related phenomena occur in other humans. So they are all brittle in the same way when trying to extrapolate and apply them to alien minds. I don’t think that qualia is on significantly firmer epistemic ground than consciousness is.

AlexMennen Jun 9, 2025, 3:40 PM
5 points
0
in reply to: mishka’s comment on: Against asking if AIs are conscious
This is correct, but I don’t think what I was trying to express relies on Camp 1 assumptions, even though I expressed it with a Camp 1 framing. If cognition is associated with some nonphysical phenomenon, then our consciousness-related concepts are still tailored to hire this phenomenon manifests specifically in humans. There could be some related metaphysical phenomenon going on in large language model, and no objective fact as to whether “consciousness” is an appropriate word to describe it.

AlexMennen Jun 9, 2025, 3:33 PM
12 points
0
in reply to: silentbob’s comment on: Against asking if AIs are conscious
Human moral judgement seem easily explained as an evolutionary adaptation for cooperation and conflict resolution, and very poorly explained by perception of objective facts. If such facts did exist, this doesn’t give humans any reason to perceive or be motivated by them.

Against asking if AIs are conscious

AlexMennenJun 9, 2025, 6:05 AM

19 points

35 comments5 min readLW link

AlexMennen May 18, 2025, 8:21 PM
5 points
3
in reply to: Matthew Barnett’s comment on: AI Doomerism in 1879
Contemporary AI existential risk concerns originated prior to it being obvious that a dangerous AI would likely involve deep learning, so no one could claim that the arguments that existed in ~2010 involved technical details of deep learning, and you didn’t need to find anything written in the 19th century to establish this.

PSA: Before May 21 is a good time to sign up for cryonics

AlexMennenMay 4, 2025, 4:10 AM

53 points

0 comments1 min readLW link

AlexMennen May 18, 2024, 1:45 AM
18 points
14
in reply to: habryka’s comment on: simeon_c’s Shortform
I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.
I wonder if it might be more effective to fund legal action against OpenAI than to compensate individual ex-employees for refusing to sign an NDA. Trying to take vested equity away from ex-employees who refuse to sign an NDA sounds likely to not hold up in court, and if we can establish a legal precident that OpenAI cannot do this, that might make other ex-employees much more comfortable speaking out against OpenAI than the possibility that third-parties might fundraise to partially compensate them for lost equity would be (a possibility you might not even be able to make every ex-employee aware of). The fact that this would avoid financially rewarding OpenAI for bad behavior is also a plus. Of course, legal action is expensive, but so is the value of the equity that former OpenAI employees have on the line.

AlexMennen Apr 29, 2024, 4:16 AM
LW: 2 AF: 1
0
AF
in reply to: jessicata’s comment on: Dequantifying first-order theories
Yeah, sorry that was unclear; there’s no need for any form of hypercomputation to get an enumeration of the axioms of U. But you need a halting oracle to distinguish between the axioms and non-axioms. If you don’t care about distinguishing axioms from non-axioms, but you do want to get an assignment of truthvalues to the atomic formulas Q(i,j) that’s consistent with the axioms of U, then that is applying a consistent guessing oracle to U.

AlexMennen Apr 25, 2024, 4:29 AM
LW: 6 AF: 1
0
AF
in reply to: jessicata’s comment on: Dequantifying first-order theories
I see that when I commented yesterday, I was confused about how you had defined U. You’re right that you don’t need a consistent guessing oracle to get from U to a completion of U, since the axioms are all atomic propositions, and you can just set the remaining atomic propositions however you want. However, this introduces the problem that getting the axioms of U requires a halting oracle, not just a consistent guessing oracle, since to tell whether something is an axiom, you need to know whether there actually is a proof of a given thing in T.

AlexMennen Apr 24, 2024, 5:57 AM
LW: 7 AF: 1
0
AF
on: Dequantifying first-order theories
I think what you proved essentially boils down to the fact that a consistent guessing oracle can be used to compute a completion of any consistent recursively axiomatizable theory. (In fact, it turns out that a consistent guessing oracle can be used to compute a model (in the sense of functions and relations on a set) of any consistent recursively axiomatizable theory; this follows from what you showed and the fact that an oracle for a complete theory can be used to compute a model of that theory.)
I disagree with
Philosophically, what I take from this is that, even if statements in a first-order theory such as Peano arithmetic appear to refer to high levels of the Arithmetic hierarchy, as far as proof theory is concerned, they may as well be referring to a fixed low level of hypercomputation, namely a consistent guessing oracle.
The translation from T to U is computable. The consistent guessing oracle only came in to find a completion of U, but it could also find a completion of T (in fact, a completion of U can be computably translated to a completion of T), so the consistent guessing oracle doesn’t really have anything to do with the relationship between T and U.

AlexMennen Apr 24, 2024, 5:30 AM
LW: 7 AF: 3
0
AF
on: Dequantifying first-order theories
a consistent guessing oracle rather than a halting oracle (which I theorize to be more powerful than a consistent guessing oracle).
This is correct. Or at least, the claim I’m interpreting this as is that there exist consistent guessing oracles that are strictly weaker than a halting oracle, and that claim is correct. Specifically, it follows from the low basis theorem that there are consistent guessing oracles that are low, meaning that access to a halting oracle makes it possible to tell whether any Turing machine with access to the consistent guessing oracle halts. In contrast, access to a halting oracle does not make it possible to tell whether any Turing machine with access to a halting oracle halts.

AlexMennen Apr 24, 2024, 5:24 AM
LW: 2 AF: 1
0
AF
on: Dequantifying first-order theories
I don’t understand what relevance the first paragraph is supposed to have to the rest of the post.

AlexMennen

Against ask­ing if AIs are conscious

PSA: Be­fore May 21 is a good time to sign up for cryonics

Against asking if AIs are conscious

PSA: Before May 21 is a good time to sign up for cryonics