Ulisse Mini

Karma: 1,724

Born too late to explore Earth; born too early to explore the galaxy; born just the right time to save humanity.

https://uli.rocks/about

Ulisse Mini Mar 19, 2023, 10:23 PM
5 points
on: My least favorite thing
Strongly agree. Rationalist culture is instrumentally irrational here. It’s very well known how important self-belief & a growth mindset is for success, and rationalists obsession with natural intelligence quite bad imo, to the point where I want to limit my interaction with the community so I don’t pick up bad patterns.

I do wonder if you’re strawmanning the advice a little, in my friend circles dropping out is seen as reasonable, though this could just be because a lot of my high-school friends already have some legible accomplishments and skills.

Understanding and controlling a maze-solving policy network

TurnTrout, peligrietzer, Ulisse Mini, Monte M and David Udell

Mar 11, 2023, 6:59 PM

333 points

28 comments23 min readLW link

Ulisse Mini Mar 4, 2023, 6:40 PM
9 points
−1
in reply to: leogao’s comment on: The Waluigi Effect (mega-post)

Each non-waluigi step increases the probability of never observing a transition to a waluigi a little bit.

Each non-Waluigi step increases the probability of never observing a transition to Waluigi a little bit, but not unboundedly so. As a toy example, we could start with P(Waluigi) = P(Luigi) = 0.5. Even if P(Luigi) monotonically increases, finding novel evidence that Luigi isn’t a deceptive Waluigi becomes progressively harder. Therefore, P(Luigi) could converge to, say, 0.8.

However, once Luigi says something Waluigi-like, we immediately jump to a world where P(Waluigi) = 0.95, since this trope is very common. To get back to Luigi, we would have to rely on a trope where a character goes from good to bad to good. These tropes exist, but they are less common. Obviously, this assumes that the context window is large enough to “remember” when Luigi turned bad. After the model forgets, we need a “bad to good” trope to get back to Luigi, and these are more common.

Ulisse Mini Mar 4, 2023, 2:32 AM
3 points
on: [Letter] Re: Advice for High School
I’d be happy to talk to [redacted] and put them in touch with other smart young people. I know a lot from Atlas, ESPR and related networks. You can pass my contact info on to them.

Predictions for shard theory mechanistic interpretability results

TurnTrout, Ulisse Mini and peligrietzer

Mar 1, 2023, 5:16 AM

105 points

10 comments5 min readLW link

Ulisse Mini Feb 11, 2023, 12:52 AM
1 point
on: Ulisse Mini’s Shortform
Exercise: What mistake is the following sentiment making?
If there’s only a one in a million chance someone can save the world, then there’d better be well more than a million people trying.
Answer:
The whole challenge of “having a one in a million chance of saving the world” is the wrong framing, the challenge is having a positive impact in the first case (for example: by not destroying the world or making things worse, e.g. from s-risks). You could think of this as a setting the zero point thing going on, though I like to think of it in terms of Bayes and Pascel’s wagers:
In terms of Bayes: You’re fixating on the expected value contributed from $10^{- 6} (B I G)$ and ignoring the rest of the $1 - 10^{- 6}$ hypothesis space. In most cases, there are corresponding low probability events which “cancel out” the EV contributed from $10^{- 6} (B I G)$ ’s direct reasoning.
(I will also note that, empirically, it could be argued Eliezer was massively net-negative from a capabilities advancements perspective; having causal links to founding of deepmind & openai. I bring this up to point out how nontrivial having a positive impact at all is, in a domain like ours)

[ASoT] Policy Trajectory Visualization

Ulisse MiniFeb 7, 2023, 12:13 AM

9 points

2 comments1 min readLW link

Ulisse Mini Jan 26, 2023, 3:36 AM
2 points
2
on: Pessimistic Shard Theory
Isn’t this only S-risk in the weak sense of “there’s a lot of suffering”—not the strong sense of “literally maximize suffering”? E.g. it seems plausible to me mistakes like “not letting someone die if they’re suffering” still gives you a net positive universe.

Also, insofar as shard theory is a good description of humans, would you say random-human-god-emperor is an S-risk? and if so, with what probability?

Ulisse Mini Jan 23, 2023, 2:53 AM
3 points
0
on: Book Review: Spiritual Enlightenment: The Damnedest Thing
The enlightened have awakened from the dream and no longer mistake it for reality. Naturally, they are no longer able to attach importance to anything. To the awakened mind the end of the world is no more or less momentous than the snapping of a twig.
Looks like I’ll have to avoid enlightenment, at least until the work is done.

Ulisse Mini Jan 19, 2023, 9:01 PM
2 points
0
on: Neural networks generalize because of this one weird trick

Take the example of the Laplace approximation. If there’s a local continuous symmetry in weight space, i.e., some direction you can walk that doesn’t affect the probability density, then your density isn’t locally Gaussian.

Haven’t finished the post, but doesn’t this assume the requirement that $ϕ (w_{1}) = ϕ (w_{2})$ when $w_{1}$ and $w_{2}$ induce the same function? This isn’t obvious to me, e.g. under the induced prior from weight decay / L2 regularization we often have $ϕ (w_{1}) \neq ϕ (w_{2})$ for weights that induce the same function.

Ulisse Mini Jan 19, 2023, 5:37 AM
2 points
0
in reply to: dsj’s comment on: Models Don’t “Get Reward”
Seems tangentially related to the train a sequence of reporters strategy for ELK. They don’t phrase it in terms of basins and path dependence, but they’re a great frame to look at it with.

Personally, I think supervised learning has low path-dependence because of exact gradients plus always being able find a direction to escape basins in high dimensions, while reinforcement learning has high path-dependence because updates influence future training data causing attractors/equilibra (more uncertain about the latter, but that’s what I feel like)

So the really out there take: We want to give the LLM influence over its future training data in order to increase path-dependence, and get the attractors we want ;)

Ulisse Mini Jan 19, 2023, 3:08 AM
1 point
0
in reply to: Bo Chin’s comment on: Where do you find people who actually do things?
I was more thinking along the lines of “you’re the average of the five people you spend the most time with” or something. I’m against external motivation too.

Incentives considered harmful

Ulisse MiniJan 15, 2023, 6:38 AM

6 points

0 comments1 min readLW link

(uli.rocks)

Ulisse Mini Jan 13, 2023, 7:45 AM
1 point
0
in reply to: mukashi’s comment on: Where do you find people who actually do things?
Edited

Ulisse Mini Jan 13, 2023, 6:59 AM
22 points
12
in reply to: blaked’s comment on: How it feels to have your mind hacked by an AI
Character.ai seems to have a lot more personality then ChatGPT. I feel bad for not thanking you earlier (as I was in disbelief), but everything here is valuable safety information. Thank you for sharing, despite potential embarrassment :)

[Question] Where do you find people who actually do things?

Ulisse MiniJan 13, 2023, 6:57 AM

7 points

12 comments1 min readLW link

Ulisse Mini Jan 12, 2023, 4:38 AM
2 points
0
in reply to: blaked’s comment on: How it feels to have your mind hacked by an AI
That link isn’t working for me, can you send screenshots or something? When I try and load it I get an infinite loading screen.

Re(prompt ChatGPT): I’d already tried what you did and some (imo) better prompt engineering, and kept getting a character I thought was overly wordy/helpful (constantly asking me what it could do to help vs. just doing it). A better prompt engineer might be able to get something working though.

Ulisse Mini Jan 12, 2023, 12:59 AM
41 points
17
on: How it feels to have your mind hacked by an AI
Can you give specific example/screenshots of prompts and outputs? I know you said reading the chat logs wouldn’t be the same as experiencing it in real time, but some specific claims like the prompt

The following is a conversation with Charlotte, an AGI designed to provide the ultimate GFE

Resulting in a conversation like that are highly implausible.^[1] At a minimum you’d need to do some prompt engineering, and even with that, some of this is implausible with ChatGPT which typically acts very unnaturally after all the RLHF OAI did.
1. ↩︎
  Source: I tried it, and tried some basic prompt engineering & it still resulted in bad outputs
What links here?
- How it feels to have your mind hacked by an AI by blaked (Jan 12, 2023, 12:33 AM; 363 points)

Ulisse Mini Dec 30, 2022, 11:38 PM
1 point
0
in reply to: trevor’s comment on: Effective Evil Causes?
Interesting I didn’t know the history, maybe I’m insufficiently pessimistic about these things. Consider my query retracted

[Question] Effective Evil Causes?

Ulisse MiniDec 30, 2022, 2:56 AM

−12 points

2 comments1 min readLW link

Ulisse Mini

Un­der­stand­ing and con­trol­ling a maze-solv­ing policy network

Pre­dic­tions for shard the­ory mechanis­tic in­ter­pretabil­ity results

[ASoT] Policy Tra­jec­tory Visualization

In­cen­tives con­sid­ered harmful

[Question] Where do you find peo­ple who ac­tu­ally do things?

[Question] Effec­tive Evil Causes?