Ronny Fernandez

Karma: 1,416

Ronny Fernandez Mar 5, 2025, 1:39 AM
5 points
−1
on: What Is The Alignment Problem?
Curated. Tackles thorny conceptual issues at the foundation of AI alignment while also revealing the weak spots of the abstractions used to do so.
I like the general strategy of trying to make progress on understanding the problem relying only on the concept of “basic agency” without having to work on the much harder problem of coming up with a useful formalism of a more full throated conception of agency, whether or not that turns out to be enough in the end.
The core point of the post: that certain kinds of goals only make sense at all given that there are certain kinds of patterns present in the environment, and that most of the problem of making sense of the alignment problem is identifying what those patterns are for the goal of “make aligned AGIs”, is plausible and worthy of discussion. I also appreciate that this post elucidates the (according to me) canon-around-these-parts general patterns that render the specific goal of aligning AGIs sensible (eg, compression based analyses of optimization) and presents them as such explicitly.
The introductory examples of patterns that must be present in the general environment for certain simpler goals to make sense—especially how the absence of the pattern makes the goal not make sense—are clear and evocative. I would not be surprised if they helped someone notice that there are some ways that the canon-around-these-parts hypothesized patterns which render “align AGIs” a sensible goal are importantly flawed.

Ronny Fernandez Mar 1, 2025, 8:32 PM
8 points
0
on: Judgements: Merging Prediction & Evidence
Curated. The problem of certain evidence is an old fundamental problem in Bayesian epistemology and this post makes a simple and powerful conceptual point tied to a standard way of trying to resolve that problem. Explaining how to think about certain evidence vs. something like Jefferey’s conditionalization under the prediction market analogy of a Bayesian agent is itself valuable. Further pointing out both that:
1) You can think of evidence and hypotheses as objects of the same type signature using the analogy.
And
2) The difference between them is revealed by the analogy to be a quantitative rather than qualitative difference.
Moves me much further in the direction of thinking that radical probabilism will be a fruitful research program. Unfruitful research programs rarely reveal deep underlying similarities between seemingly very different types of fundamental objects.

Ronny Fernandez Oct 27, 2024, 11:56 PM
2 points
0
in reply to: Sodium’s comment on: Lighthaven Sequences Reading Group #7 (Tuesday 10/22)
There is! It is now posted! Sorry about the delay.

Ronny Fernandez Oct 4, 2024, 9:23 PM
3 points
0
on: Lighthaven Sequences Reading Group #5 (Tuesday 10/08)
Hello, last time a taught a class on the basics of Bayesian epistemology. This time I will teach a class that goes a bit further. I will explain what a proper scoring rule is and we will also do some calibration training. In particular, we will play a calibration training game called two lies, a truth, and a probability. I will do this at 7:30 the same place as last time. Come by to check it out.

Ronny Fernandez Oct 1, 2024, 4:35 PM
7 points
0
on: Lighthaven Sequences Reading Group #4 (Tuesday 10/01)
Hello! Please note that I will be giving a class called the Bayesics in Eigen hall at 7:30. Heard of Bayes’s theorem but don’t fully understand what the fuss is about? Want to have an intuitive as well as formal understanding of what the Bayesian framework is? Want to learn how to do bayesian updates in your head? Come and learn the Bayesics.

Ronny Fernandez Sep 30, 2024, 9:40 PM
2 points
0
on: First Lighthaven Sequences Reading Group
Also, please note that I will be giving a class at 7:30 after the reading group called “The Bayesics” where I will teach you the basics of intuitive Bayesian epistemology and how to do Bayesian updates irl on the fly as a human. All attending the reading group are welcome to join for that as well.

Ronny Fernandez Apr 8, 2024, 5:08 PM
2 points
0
in reply to: Thomas Kwa’s comment on: Thomas Kwa’s Shortform
I think you should still write it. I’d be happy to post it instead or bet with you on whether it ends up negative karma if you let me read it first.

Ronny Fernandez Apr 8, 2024, 1:01 AM
32 points
21
on: Ronny Fernandez’s Shortform
AN APOLOGY ON BEHALF OF FOOLS FOR THE DETAIL ORIENTED

Misfits, hooligans, and rabble rousers.
Provocateurs and folk who don’t wear trousers.
These are my allies and my constituents.
Weak in number yet suffused with arcane power.

I would never condone bullying in my administration.
It is true we are at times moved by unkind motivations.
But without us the pearl clutchers, hard asses, and busy bees would overrun you.
You would lose an inch of slack per generation.

Many among us appreciate your precision.
I admit there are also those who look upon it with derision.
Remember though that there are worse fates than being pranked.
You might instead have to watch your friend be “reeducated”, degraded, and spanked
On high broadband public broadcast television.

We’re not so different really.
We often share your indignation
With those who despise copulation.
Although our alliance might be uneasy
We both oppose the soul’s ablation.

So let us join as cats and dogs, paw in paw
You will persistently catalog
And we will joyously gnaw.

Ronny Fernandez Apr 5, 2024, 11:21 PM
35 points
3
on: What’s with all the bans recently?
Hey, I’m just some guy but I’ve been around for a while. I want to give you a piece of feedback that I got way back in 2009 which I am worried no one has given you. In 2009 I found lesswrong, and I really liked it, but I got downvoted a lot and people were like “hey, your comments and posts kinda suck”. They said, although not in so many words, that basically I should try reading the sequences closely with some fair amount of reverence or something.
I did that, and it basically worked, in that I think I really did internalize a lot of the values/tastes/habits that I cared about learning from lesswrong, and learned much more so how to live in accordance with them. Now I think there were some sad things about this, in that I sort of accidentally killed some parts of the animal that I am, and it made me a bit less kind in some ways to people who were very different from me, but I am overall glad I did it. So, maybe you want to try that? Totally fair if you don’t, definitely not costless, but I am glad that I did it to myself overall.
What links here?
- Pick two: concise, comprehensive, or clear rules by Screwtape (Feb 3, 2025, 6:39 AM; 77 points)

Ronny Fernandez Apr 3, 2024, 10:06 PM
2 points
0
on: Ronny Fernandez’s Shortform
I didn’t figure out that the “bow” in “rainbow” referred to a bow like as in bow and arrow, and not a bow like a bow on a frilly dress, until five minutes ago. I was really pretty confused about this since I was like 8. Somebody could’ve explained but nobody did.

Ronny Fernandez Mar 26, 2024, 3:24 PM
8 points
2
on: MATS AI Safety Strategy Curriculum
I want to note for posterity that I tried to write this reading list somewhat impartially. That is, I have a lot of takes about a lot of this stuff, and I tried to include a lot of material that I disagree with but which I have found helpful in some way or other. I also included things that people I trust have found helpful even if I personally never found it helpful.

Ronny Fernandez Mar 26, 2024, 3:20 PM
8 points
3
in reply to: MondSemmel’s comment on: LessOnline (May 31—June 2, Berkeley, CA)
I believe there isn’t really a deadline! You just buy tickets and then you can come. Tickets might sellout is the limiting factor.

Ronny Fernandez Dec 22, 2023, 8:04 PM
5 points
5
in reply to: Ronny Fernandez’s comment on: Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning
In retrospect I think the above was insufficiently cooperative. Sorry,

Ronny Fernandez Dec 22, 2023, 11:29 AM
33 points
2
on: Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning
To be clear, I did not think we were discussing the AI optimist post. I don’t think Nate thought that. I thought we were discussing reasons I changed my mind a fair bit after talking to Quintin.

Ronny Fernandez Dec 22, 2023, 11:22 AM
5 points
2
in reply to: Richard_Ngo’s comment on: Ronny and Nate discuss what sorts of minds humanity is likely to find by Machine Learning
I meant the reasonable thing other people knew I meant and not the deranged thing you thought I might’ve meant.

Ronny Fernandez Mar 10, 2023, 3:02 AM
2 points
0
in reply to: Robert_AIZI’s comment on: Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?
Yeah I’m totally with you that it definitely isn’t actually next token prediction, it’s some totally other goal drawn from the dist of goals you get when you sgd for minimizing next token prediction surprise.

Ronny Fernandez Mar 10, 2023, 12:29 AM
4 points
0
in reply to: Robert_AIZI’s comment on: Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?
I suppose I’m trying to make a hypothetical AI that would frustrate any sense of “real self” and therefore disprove the claim “all LLMs have a coherent goal that is consistent across characters”. In this case, the AI could play the “benevolent sovereign” character or the “paperclip maximizer” character, so if one claimed there was a coherent underlying goal I think the best you could say about it is “it is trying to either be a benevolent sovereign or maximize paperclips”. But if your underlying goal can cross such a wide range of behaviors it is practically meaningless! (I suppose these two characters do share some goals like gaining power, but we could always add more modes to the AI like “immediately delete itself” which shrinks the intersection of all the characters’ goals.)
Oh I see! Yeah I think we’re thinking about this really differently. Imagine there was an agent whose goal was to make little balls move according to some really diverse and universal laws of physics, for the sake of simplicity let’s imagine newtonian mechanics. So ok, there’s this agent that loves making these balls act as if they follow this physics. (Maybe they’re fake balls in a simulated 3d world, doesn’t matter as long as they don’t have to follow the physics. They only follow the physics because the agent makes them, otherwise they would do some other thing.)

Now one day we notice that we can arrange these balls in a starting condition where they emulate an agent that has the goal of taking over ball world. Another day we notice that by just barely tweaking the start up we can make these balls simulate an agent that wants one pint of chocolate ice cream and nothing else. So ok, does this system really have on coherent goal? Well the two systems that the balls could simulate are really different, but the underlying intelligence making the balls act according to the physics has one coherent goal: make the balls act according to the physics.

The underlying LLM has something like a goal, it is probably something like “predict the next token as well as possible” although definitely not actually that because of inner outer alignment stuff. Maybe current LLMs just aren’t mind like enough to decompose into goals and beliefs, that’s actually what I think, but some program that you found with sgd to minimize surprise on tokens totally would be mind like enough, and its goal would be some sort of thing that you find when you sgd to find programs that minimize surprise on token prediction, and idk, that could be like pretty much anything. But if you then made an agent by feeding this super LLM a prompt that sets it up to simulate an agent, well that agent might have some totally different goal, and it’s gonna be totally unrelated to the goals of the underlying LLMs that does the token prediction in which the other agent lives.

Ronny Fernandez Mar 9, 2023, 9:39 PM
13 points
3
in reply to: Robert_AIZI’s comment on: Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?
So the shoggoth here is the actual process that gets low loss on token prediction. Part of the reason that it is a shoggoth is that it is not the thing that does the talking. Seems like we are onboard here.

The shoggoth is not an average over masks. If you want to see the shoggoth, stop looking at the text on the screen and look at the input token sequence and then the logits that the model spits out. That’s what I mean by the behavior of the shoggoth.
On the question of whether it’s really a mind, I’m not sure how to tell. I know it gets really low loss on this really weird and hard task and does it better than I do. I also know the task is fairly universal in the sense that we could represent just about any task in terms of the task it is good at. Is that an intelligence? Idk, maybe not? I’m not worried about current LLMs doing planning. It’s more like I have a human connectnome and I can do one forward pass through it with an input set of nerve activations. Is that an intelligence? Idk, maybe not?

I think I don’t understand your last question. The shoggoth would be the thing that gets low loss on this really weird task where you predict sequences of characters from an alphabet with 50,000 characters that have really weird inscrutable dependencies between them. Maybe it’s not intelligent, but if it’s really good at the task, since the task is fairly universal, I expect it to be really intelligent. I further expect it to have some sort of goals that are in some way related to predicting these tokens well.

Ronny Fernandez Mar 9, 2023, 8:12 PM
39 points
9
on: Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?
The shoggoth is supposed to be a of a different type than the characters. The shoggoth for instance does not speak english, it only knows tokens. There could be a shoggoth character but it would not be the real shoggoth. The shoggoth is the thing that gets low loss on the task of predicting the next token. The characters are patterns that emerge in the history of that behavior.
What links here?
- ′ petertodd’’s last stand: The final days of open GPT-3 research by mwatkins (Jan 22, 2024, 6:47 PM; 109 points)
- The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited by mwatkins (Jul 31, 2023, 7:47 PM; 85 points)

Ronny Fernandez Dec 5, 2022, 5:07 PM
2 points
0
in reply to: Holly_Elmore’s comment on: Aligned Behavior is not Evidence of Alignment Past a Certain Level of Intelligence
Yeah I think this would work if you conditioned on all of the programs you check being exactly equally intelligent. Say you have a hundred superintelligent programs in simulations and one of them is aligned, and they are all equally capable, then the unaligned ones will be slightly slower in coming up with aligned behavior maybe, or might have some other small disadvantage.

However, in the challenge described in the post it’s going to be hard to tell a level 999 aligned superintelligence from a level 1000 unaligned superintelligence.

I think the advantage of the aligned superintelligence will only be slight because finding the action that maximizes utility function u is just as computationally difficult whether you yourself value u or not. It may not be equally hard for humans regardless of whether the human really values u, but I don’t expect that to generalize across all possible minds.