Morphism

Karma: 176

Morphism Feb 6, 2025, 10:10 PM
5 points
0
on: Pi Rogers’s Shortform
Convex agents are practically invisible.

We currently live in a world full of double-or-nothing gambles on resources. Bet it all on black. Invest it all in risky options. Go on a space mission with a 99% chance of death, but a 1% chance of reaching Jupiter, which has about 300 times the mass-energy of earth, and none of those pesky humans that keep trying to eat your resources. Challenge one such pesky human to a duel.

Make these bets over and over again and your chance of total failure (i.e. death) approaches 100%. When convex agents appear in real life, they do this, and very quickly die. For these agents, that is all part of the plan. Their death is worth it for a fraction of a percent chance of getting a ton of resources.

But we, as concave agents, don’t really care. We might as well be in completely logically disconnected worlds. Convex agents feel the same about us, since most of their utility is concentrated on those tiny-probability worlds where a bunch of their bets pay off in a row (for most value functions, that means we die). And they feel even more strongly about each other.

This serves as a selection argument for why agents we see in real life (including ourselves) tend to be concave (with some notable exceptions). The convex ones take a bunch of double-or-nothing bets in a row, and, in almost all worlds, eventually land on “nothing”.

Morphism Dec 24, 2024, 11:32 AM
4 points
−1
on: Pi Rogers’s Shortform

If you’re thinking without writing, you only think you’re thinking.

-Leslie Lamport

This seems..… straightforwardly false. People think in various different modalities. Translating that modality into words is not always trivial. Even if by “writing”, Lamport means any form of recording thoughts, this still seems false. Often times, an idea incubates in my head for months before I find a good way to represent it as words or math or pictures or anything else.

Also, writing and thinking are separate (albiet closely related) skills, especially when you take “writing” to mean writing for an audience, so the thesis of this Paul Graham post is also false. I’ve been thinking reasonably well for about 16 years, and only recently have I started gaining much of an ability to write.

Are Lamport and Graham just wordcels making a typical mind fallacy, or is there more to this that I’m not seeing? What’s the steelman of this claim that good thinking == good writing?

Morphism Dec 14, 2024, 11:22 AM
0 points
−3
in reply to: kave’s comment on: carado’s Shortform
If you want to get huge profits to solve alignment, and are smart/capable enough to start a successful big AI lab, you are probably also smart/capable enough to do some other thing that makes you a lot of money without the side effect of increasing P(doom).

Morphism Dec 11, 2024, 11:53 AM
4 points
3
in reply to: Tamsin Leake’s comment on: carado’s Shortform
Moral Maze dynamics push corporations not just to pursue profit at all other costs, but also to be extremely myopic. As long as the death doesn’t happen before the end of the quarter, the big labs, being immoral mazes, have no reason to give a shit about x-risk. Of course, every individual member of a big lab has reason to care, but the organization as an egregore does not (and so there is strong selection pressure for these organizations to have people that have low P(doom) and/or don’t (think they) value the future lives of themselves and others).

Morphism Dec 11, 2024, 11:41 AM
11 points
2
on: Pi Rogers’s Shortform
Contrary to what the current wiki page says, Simulacrum levels 3 and 4 are not just about ingroup signalling. See these posts and more, as well as Beaudrillard’s original work if you’re willing to read dense philosophy.

Here is an example where levels 3 and 4 don’t relate to ingroups at all, which I think may be more illuminating than the classic “lion across the river” example:

Alice asks “Does this dress makes me look fat?” Bob says “No.”

Depending on the simulacrum level of Bob’s reply, he means:
1. “I believe that the dress does not make you look fat.”
2. “I want you to believe that the dress does not make you look fat, probably because I want you to feel good about yourself.”
3. “Niether you nor I are autistic truth-obsessed rationalists, and therefore I recognize that you did not ask me this question out of curiosity as to whether or not the dress makes you look fat. Instead, due to frequent use of simulacrum level 2 to respond to these sorts of queries in the past, a new social equilibrium has formed where this question and its answer are detached from object-level truth, instead serving as a signal that I care about your feelings. I do care about your feelings, so I play my part in the signalling ritual and answer ‘No.’”
4. “Similar to 3, except I’m a sociopath and don’t necessarily actually care about your feelings. Instead, I answer ‘No’ because I want you to believe that I care about your feelings.”
Here are some potentially better definitions, of which the group association definitions are a clear special case:
1. Communication of object-level truth.
2. Optimization over the listener’s belief that the speaker is communicating on simulacrum level 1, i.e. desire to make the listener believe what the listener says.
These are the standard old definitions. The transition from 1 to 2 is pretty straightforward. When I use 2, I want you to believe I’m using 1. This is not necessarily lying. It is more like Frankfurt’s bullshit. I care about the effects of this belief on the listener, regardless of its underlying truth value. This is often (naively considered) prosocial, see this post for some examples.

Now, the transition from 2 to 3 is a bit tricky. Level 3 is a result of a social equilibrium that emerges after communication in that domain gets flooded by prosocial level 2. Eventually, everyone learns that these statements are not about object-level reality, so communication on levels 1 and 2 become futile. Instead, we have:
1. Signalling of some trait or bid associated with historical use of simulacrum level 2.
E.g. that Alice cares about Bob’s feelings, in the case of the dress, or that I’m with the cool kids that don’t cross the river, in the case of the lion. Another example: bids to hunt stag.

3 to 4 is analogous to 1 to 2.
1. Optimization over the listener’s belief that the speaker is comminicating on simulacrum level 3, i.e. desire to make the listener believe that the speaker has the trait signalled by simulacrum level 3 communication (i.e. the trait that was historically associated with prosocial level 2 communication).
Like with the jump from 1 to 2, the jump from 3 to 4 has the quality of bullshit, not necessarily lies. Speaker intent matters here.

Morphism Dec 9, 2024, 9:31 PM
1 point
0
in reply to: Richard_Kennaway’s comment on: Pi Rogers’s Shortform
Oops that was a typo. Fixed now, and added a comma to clarify that I mean the latter.

Morphism Dec 9, 2024, 12:42 AM
2 points
0
on: Pi Rogers’s Shortform
Formalizing Placebomancy

I propose the following desideratum for self-referential doxastic modal agents (agents that can think about their own beliefs), where $□ A$ represents “I believe $A$ ”, $(W | A)$ represents the agent’s world model conditional on $A$ , and $≻$ is the agent’s preference relation:

Positive Placebomancy: For any proposition $P$ , The agent concludes $P$ from $□ P \to P$ , if $(W | P) ≻ (W | \neg P)$ .

In natural English: The agent believes that hyperstitions, that benefit the agent if true, are true.

“The placebo effect works on me when I want it to”.

A real life example: In this sequence post, Eliezer Yudkowsky advocates for using positive placebomancy on “I cannot self-deceive”.

I would also like to formalize a notion of “negative placebomancy” (doesn’t believe hyperstitions that don’t benefit it), “total placebomancy” (believes hypestitions iff they are beneficial), “group placebomancy” (believes group hyperstitions that are good for everyone in the group, conditional on all other group members having group placebomancy or similar), and generalizations to probabilistic self-referential agents (like “ideal fixed-point selection” for logical inductor agents).

I will likely cover all of these in a future top-level post, but I wanted to get this idea out into the open now because I keep finding myself wanting to reference it in conversation.

Edit log:
- 2024-12-08 rephrased the criterion to be an inference rule rather than an implication. Also made a minor grammar edit.

Morphism Dec 7, 2024, 9:41 AM
14 points
6
in reply to: sapphire’s comment on: deluks917′s Shortform
I think I know (80% confidence) the identity of this “local Vassarite” you are referring to, and I think I should reveal it, but, y’know, Unilateralist’s Curse, so if anyone gives me a good enough reason not to reveal this person’s name, I won’t. Otherwise, I probably will, because right now I think people really should be warned about them.

Morphism Nov 30, 2024, 5:43 AM
35 points
0
on: Pi Rogers’s Shortform
People often say things like “do x. Your future self will thank you.” But I’ve found that I very rarely actually thank my past self, after x has been done, and I’ve reaped the benefits of x.

This quick take is a preregistration: For the next month I will thank my past self more, when I reap the benefits of a sacrifice of their immediate utility.

e.g. When I’m stuck in bed because the activation energy to leave is too high, and then I overcome that and go for a run and then feel a lot more energized, I’ll look back and say “Thanks 7 am Morphism!”

(I already do this sometimes, but I will now make a TAP out of it, which will probably cause me to do it more often.)

Then I will make a full post describing in detail what I did and what (if anything) changed about my ability to sacrifice short-term gains for greater long-term gains, along with plausible theories w/ probabilities on the causal connection (or lack thereof), as well as a list of potential confounders.

Of course, it is possible that I completely fail to even install the TAP. I don’t think that’s very likely, because I’m #1-prioritizing my own emotional well-being right now (I’ll shift focus back onto my world-saving pursuits once I’m more stablely not depressed). In that case I will not write a full post because the experiment would have not even been done. I will instead just make a comment on this shortform to that effect.

Morphism Jun 1, 2024, 3:55 PM
1 point
0
on: Pi Rogers’s Shortform
Edit: There are actually many ambiguities with the use of these words. This post is about one specific ambiguity that I think is often overlooked or forgotten.

The word “preference” is overloaded (and so are related words like “want”). It can refer to one of two things:
- How you want the world to be i.e. your terminal values e.g. “I prefer worlds in which people don’t needlessly suffer.”
- What makes you happy e.g. “I prefer my ice cream in a waffle cone”
I’m not sure how we should distinguish these. So far, my best idea is to call the former “global preferences” and the latter “local preferences”, but that clashes with the pre-existing notion of locality of preferences as the quality of terminally caring more about people/objects closer to you in spacetime. Does anyone have a better name for this distinction?

I think we definitely need to distinguish them, however, because they often disagree, and most “values disagreements” between people are just disagreements in local preferences, and so could be resolved by considering global preferences.

I may write a longpost at some point on the nuances of local/global preference aggregation.

Example: Two alignment researchers, Alice and Bob, both want access to a limited supply of compute. The rest of this example is left as an exercise.

Morphism’s Shortform

MorphismJun 1, 2024, 3:55 PM

2 points

16 comments1 min readLW link

Morphism May 16, 2024, 8:30 PM
5 points
2
in reply to: cubefox’s comment on: Feeling (instrumentally) Rational
Emotions can be treated as properties of the world, optimized with respect to constraints like anything else. We can’t edit our emotions directly but we can influence them.

Feeling (instrumentally) Rational

MorphismMay 16, 2024, 6:56 PM

14 points

5 comments1 min readLW link

Morphism May 2, 2024, 10:57 PM
1 point
0
in reply to: mako yass’s comment on: Please stop publishing ideas/insights/research about AI
Oh no I mean they have the private key stored on the client side and decrypt it there.

Ideally all of this is behind a nice UI, like Signal.

Morphism May 2, 2024, 10:55 PM
1 point
0
in reply to: mako yass’s comment on: Please stop publishing ideas/insights/research about AI
I mean, Signal messenger has worked pretty well in my experience.

CCS: Counterfactual Civilization Simulation

MorphismMay 2, 2024, 10:54 PM

3 points

0 comments2 min readLW link

Morphism May 2, 2024, 8:37 PM
9 points
7
in reply to: ryan_greenblatt’s comment on: Please stop publishing ideas/insights/research about AI
But safety research can actually disproportionally help capabilities, e.g. the development of RLHF allowed OAI to turn their weird text predictors into a very generally useful product.

Morphism May 2, 2024, 8:33 PM
3 points
3
in reply to: Tamsin Leake’s comment on: Please stop publishing ideas/insights/research about AI
I could see embedded agency being harmful though, since an actual implementation of it would be really useful for inner alignment

Morphism May 2, 2024, 8:29 PM
10 points
5
in reply to: Chi Nguyen’s comment on: Please stop publishing ideas/insights/research about AI
Some off the top of my head:
- Outer Alignment Research (e.g. analytic moral philosophy in an attempt to extrapolate CEV) seems to be totally useless to capabilities, so we should almost definitely publish that.
- Evals for Governance? Not sure about this since a lot of eval research helps capabilities, but if it leads to regulation that lengthens timelines, it could be net positive.
Edit: oops i didn’t see tammy’s comment

Morphism May 2, 2024, 8:18 PM
10 points
−1
on: Please stop publishing ideas/insights/research about AI
Idea:

Have everyone who wants to share and recieve potentially exfohazardous ideas/research send out a 4096-bit RSA public key.

Then, make a clone of the alignment forum, where every time you make a post, you provide a list of the public keys of the people who you want to see the post. Then, on the client side, it encrypts the post using all of those public keys. The server only ever holds encrypted posts.

Then, users can put in their own private key to see a post. The encrypted post gets downloaded to the user’s machine and is decrypted on the client side. Perhaps require users to be on open-source browsers for extra security.

Maybe also add some post-quantum thing like what Signal uses so that we don’t all die when quantum computers get good enough.

Should I build this?

Is there someone else here more experienced with csec who should build this instead?

Morphism

Convex agents are practically invisible.

Formalizing Placebomancy

Mor­phism’s Shortform

Feel­ing (in­stru­men­tally) Rational

CCS: Coun­ter­fac­tual Civ­i­liza­tion Simulation

Morphism’s Shortform

Feeling (instrumentally) Rational

CCS: Counterfactual Civilization Simulation