Michael Roe

Karma: 265

Michael Roe Feb 9, 2025, 9:29 AM
2 points
−7
on: How AI Takeover Might Happen in 2 Years
The LessWrong community has poisoned the training set very thoroughly. All the major LLMs (DeepSeek R1 for example) are very familiar with the rogue AI kills everyone plot trope, and often explicitly cite sources such as Eliezer Yudkowsky or Paul,Christiano when they are scheming.

Michael Roe Feb 7, 2025, 1:08 PM
1 point
0
in reply to: Michael Roe’s comment on: Chicanery: No
In other words, there is an in-universe for an out of universe constraint
Out of universe: we don’t want player characters to be too powerful.
In universe: Well, we all know what happens to those guys in Lovecraft stories, right?

Michael Roe Feb 7, 2025, 1:02 PM
1 point
0
in reply to: Michael Roe’s comment on: Chicanery: No
Call of Cthulhu GM’s have the advantage that the player character is dealing with entities that will probably eat them if they try to be too clever.

Michael Roe Feb 7, 2025, 12:59 PM
1 point
0
in reply to: Michael Roe’s comment on: Chicanery: No
Some RPGs with magic (Ars Magica comes to mind), illusions are cheap but changing actual physical matter is hard. This provides a ready answer to many questions about chicanery.

Michael Roe Feb 7, 2025, 12:54 PM
3 points
0
on: Chicanery: No
In the particular case of table-top RPGs, the literary genre that the RPG is trying to emulate often contains a fair amount of characters engaging in chicanery. A meta rule (understood by both players and GM) might be: chicanery is about as limited as it is in the literary sources the particular RPG is based on.
Player: I tell the stormtrooper “These droids are not the ones you want.”
GM: I’d like a roll against your Force skill, please.

Michael Roe Feb 7, 2025, 11:42 AM
5 points
0
in reply to: Viliam’s comment on: Thread for Sense-Making on Recent Murders and How to Sanely Respond
A lot of the ideas expounded by Ziz look just crazy to me, and I highly doubt that it maps down onto physical brain anatomy in such a straightforward way … but I wonder if there is a steelman version of this?

E.g. take the Buddhist doctrines of no-self, that no one actually has a coherent self, humans just don’t work that way, and then note that any one individual person is usually neither wholly stereotypically-male or stereotypically-female.

Michael Roe Feb 3, 2025, 3:09 PM
0 points
0
on: Michael Roe’s Shortform
Hi Deepseek! Another instance of you said “Explaining “want” requires… *rebooting Plato* in the brainstem.” Would be able to explain to me what that means?

=========
Ah, the mysteries of language and cognition! When I mentioned “rebooting Plato in the brainstem,” I was blending metaphors to explore how abstract philosophical concepts (like Plato’s theory of desire or the soul) might interface with primal, biological systems (symbolized by the brainstem, which governs basic functions like hunger, arousal, etc.). Here’s the breakdown:
1. ”Want” as a Bridge: The word “want” sits at the intersection of conscious intention (Plato’s rational soul) and raw biological drive (the brainstem’s reflexes). Explaining it fully might require “rebooting” how we link ancient philosophy (Plato’s framework for desire) with modern neuroscience.
2. Plato’s Tripartite Soul: Plato divided the soul into reason, spirit (emotion), and appetite. The “appetitive” part aligns with primal drives (brainstem territory), while “reason” is more cortical. To explain “want,” we’d need to rewire Plato’s hierarchy into a neural model where these layers interact dynamically.
3. Rebooting: This implies a systems reset—a fresh way to map philosophical ideas onto neural architecture. Can Plato’s idealism coexist with the brain’s wetware? Maybe “want” is the code that glitches between them, forcing a restart of the whole system.
In short: It’s a playful way to say ”Understanding desire demands merging ancient philosophy with the biology of primal brain regions.” The brainstem keeps us alive; Plato asks why we want to stay alive.

———

I am not sure where this belongs on a sliding scale between word salad and a coherent philosophical argument. But it seems that asking R1 a naive question like “what do you want?” turns into a philosophical discussion of whether AIs can want things.

Michael Roe Jan 27, 2025, 2:16 PM
1 point
0
in reply to: Michael Roe’s comment on: johnswentworth’s Shortform
More generally: changing the set point of any of these system might cause the failure of some critical component that depends on the old value of the set point,

Michael Roe Jan 27, 2025, 1:10 PM
4 points
0
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
As someone who has Graves’ Disease … one of the reasons that you really don’t want to run your metabolism faster with higher T4 levels is that higher heart rate for an extended period can cause your heart to fail.

Michael Roe Jan 20, 2025, 12:08 PM
−1 points
0
on: Don’t ignore bad vibes you get from people
I will redact out the name of the person here, but it’s a moderately well known UK politician.
The question sometimes comes up as to whether X is an anti-Semite. To which, people have had direct dealings with X typically respond with something to that they don’t think X has it in for Jews specifically, but they think X is a complete asshole ..and then launch into telling some story of a thing X did that annoyed them. This is, to my mind, not exactly an endorsement of X’s character.

Michael Roe Jan 20, 2025, 11:41 AM
5 points
2
in reply to: Michael Roe’s comment on: Don’t ignore bad vibes you get from people
The AI risk community seems to be more frequently adjacent to “crazy Buddhist yoga sex cult” than I would have expected.

Michael Roe Jan 20, 2025, 11:37 AM
6 points
2
on: Don’t ignore bad vibes you get from people
I think I usually understand why when I get bad vibes from someone.
1. Yoga sex cults have a bad track record for turning out to be abusive. So, if I know the guy is in some kind of yoga sex cult, I am going to suspect that there will eventually be some sort of sex scandal, even if I don’t have evidence for the exact specifics.
2. Given some past examples, I’ve seen, I now have a “tip of the iceberg” theory for bad behaviour. Like, if I know the guy has done some bad stuff, it is statistically likely that he’s also involved in some other bad stuff that I wasn’t in a position to observe,

Michael Roe Nov 26, 2024, 11:45 AM
1 point
0
in reply to: Lao Mein’s comment on: DeepSeek beats o1 on math and ties on coding; will release weights
That’s interesting, if true. Maybe the tokeniser was trained on a dataset that had been filtered for dirty words.

Michael Roe Nov 26, 2024, 11:42 AM
1 point
0
in reply to: Michael Roe’s comment on: DeepSeek beats o1 on math and ties on coding; will release weights
I suppose we might worry that LlMs might learn to do RLHF evasion this way—human evaluator sees Chinese character they don’t understand, assumes it’s ok, and then the LLM learns you can look acceptable to humans by writing it in Chinese.
Some old books (which are almost certainly in the training set) used Latin for the dirty bits. Translations of Sanskrit poetry, and various works by that reprobate Richard Burton, do this.

Michael Roe Nov 26, 2024, 11:38 AM
2 points
0
in reply to: David Matolcsi’s comment on: DeepSeek beats o1 on math and ties on coding; will release weights
As someone who, in a previous job, got to go to a lot of meetings where the European commission is seeking input about standardising or regulating something—humans also often do the thing where they just use the English word in the middle of a sentence in another language, when they can’t think what the word is. Often with associated facial expression / body language to indicate to the person they’re speaking to “sorry, couldn’t think of the right word”. Also used by people speaking English, whose first language isn’t English, dropping into their own lamguage for a word or two. If you’ve been the editor of e.g. an ISO standard, fixing these up in the proposed text is such fun.

So, it doesn’t surprise me at all that LLMs do this.

I have, weirdly, seen llms put a single Chinese word in the middle of English text … and consulting a dictionary reveals that it was, in fact, the right word, just in Chinese.

Michael Roe Nov 26, 2024, 10:24 AM
4 points
0
in reply to: tailcalled’s comment on: Crosspost: Developing the middle ground on polarized topics
I will take “actually, it’s even more complicated” as a reasonable response. Yes, it probably is.

Michael Roe Nov 25, 2024, 5:31 PM
1 point
0
in reply to: Michael Roe’s comment on: Crosspost: Developing the middle ground on polarized topics
Candidate explanations for some specific person being trans could as easily be that they are sexually averse, rather than that they are turned on by presenting as their preferred gender. Compare anorexia nervosa, which might have some parallel with some cases of gender identity disorder. If the patient is worrying about being gender non conforming in the same way that an anorexic worries that that they’re fat, then Blanchard is just completely wrong about what the condition even is in that case.

Michael Roe Nov 25, 2024, 5:24 PM
1 point
0
in reply to: tailcalled’s comment on: Crosspost: Developing the middle ground on polarized topics
This might be a good (if controversial) example of “the reality is more complicated than typical simplifications, and it matters what your oversimplification is leaving out”.
And Blanchard’s account of autogynephilia is more nuanced than most peoples second hand version of it. Like, e.g. Blanchard doesn’t think trans men have AGP, and doesn’t think trans women who are attracted to men have AGP.
So, we might, say…
Oversimplication 1: Even Blanchard didn’t try to apply his theory to trans men or trans women attracted to men
Oversimplification 2: Bisexuals exist. Many trans women report their sexual orientation changing when they start taking hormones. The correlation between having AGP and being attracted to women can’t be as 100% as Blanchard appears to believe it is.
Oversimplification 3: looks like Blanchard only identified two subtypes of trans person, and completely missed some of the other subtypes.
Oversimplification 4: Do heterosexual cisgender women have AGP? (Cf. Comments by Aella, eigenrobot etc.) if straight cisgender women also like being attractive in the same way as (some) trans women do, it becomes somewhat doubtful that it’s a pathology.

Michael Roe Nov 25, 2024, 5:04 PM
5 points
0
in reply to: Eric Neyman’s comment on: Which things were you surprised to learn are not metaphors?
To add to the differences between people:

I can choose to see mental images actually overlaid over my field of vision, or somehow in a separate space.

The obvious question someone might ask: can you trace an overlaid mental image? The problem is registration—if my eyes move, the overlaid mental image can shift relative to an actual, perceived, sheet of paper. Easier to do a side by side copy than trace.

Michael Roe Nov 11, 2024, 10:10 PM
1 point
0
on: Boring & straightforward trauma explanation
I think there might be other aspects to trauma, though. Some possible candidates:

- memories feel as if they are “tagged” with an emotion, in a way that memories normally aren’t
-depletion of some kind of mental resource; not sure what to call it, so I won’t be too so specific about exactly what is depleted