What exactly is being done—what type of thing is being created—when we run a process like “use gradient descent to minimize a loss function on training data, as long as the loss function is also being minimized on test data”?
Tofly
Is a language model performing utility maximization during training?
Let’s ignore RLHF for now and just focus on next token prediction. There’s an argument that, of course the LM is maximizing a utility function—namely it’s log score on predicting the next token, over the distribution of all text on the internet (or whatever it was trained on). An immediate reaction I have to this is that this isn’t really what we want, even ignoring that we want the text to be useful (as most internet text isn’t).
This is clearly related to all the problems around overfitting. My understanding is that in practice, this is solved through a combination of regularization, and stopping training once test loss stops decreasing. So even if a language model was a UM during training, we already have some guardrails on it. Are they enough?
Are language models utility maximizes?
I think there’s two different major “phases” of a language model, training and runtime. During training, the model is getting “steered” toward some objective function—first getting the probability of the next token “right”, and then getting positive feedback from humans during rlhf (I think? I should read exactly how rlhf works). Is this utility maximization? It doesn’t feel like it—I think I’ll put my thoughts on this in another comment.
During runtime, at first glance, the model is kind of “deterministic” (wrong word), in that it’s “just multiplying matrices”, but maybe it “learned” some utility maximizers during training and they’re embedded within it. Not sure if this is actually possible, or if it happens in practice, and if the utility maximizers are dominate the agent or can be “overruled” by other parts of it.
I’m going to use this space to blurt some thoughts about ai risk.
Personal data point: I tried this last night, and had 4 vivid and intense nightmares. This is not a usual experience for me.
There’s sort of a general “digestive issues are the root of all anxiety/evil” thread I’ve seen pop up in a bunch of rationalist-adjacent spaces:
https://www.reddit.com/r/slatestarcodex/comments/yxyrf3/comment/iwr9vf2/
https://twitter.com/Aella_Girl/status/1454952550327848964
I’m curious if there’s any synthesis / study / general theory of this.
As my own datapoint, I’ve had pretty bad digestive issues, trouble eating, and ahedonia for a while. I recently got to do a natural experiment when I accidentally left my milk out. Since I’ve cut milk, I’ve felt much better (though not perfect) on all counts. So I’m probably lactose intolerant (though I never really noticed a correlation between my symptoms and milk consumption).
Probably worth checking if there are any easy fixes to your digestive issues if you have them.
Reminds me of arbital.
cure Eliezer’s chronic fatigue so he can actually attempt to
grant humanity a couple more bits of information-theoretic dignitysave the worldPossibly relevant: I know someone who had chronic fatigue syndrome which largely disappeared after she had her first child. I could possibly put her in contact with Eliezer or someone working on the problem.
The entrepreneur contacted me again the next day...”I have been cooking all week.”
Hmmmmmm...
I strongly upvoted the comment above.
(Then I retracted my upvote.)
I will strongly upvote all replies to this comment which state that they strongly upvoted this comment.
Yes, it is easy for you to defect here, or undo your strong upvote after commenting. However, I modeled the typical lesswrong user before making this comment, and would not have made it if they were likely to defect.
[Meta] The jump from Distinct Configurations to Collapse Postulates in the Quantum Physics and Many Worlds sequence is a bit much—I don’t think the assertiveness of Collapse Postulates is justified without a full explanation of how many worlds explains things. I’d recommend adding at least On Being Decoherent in between.
I am a first year CS PhD student at Cornell, and interested (though not currently working on it). I will DM you.
The brain may also be excessively complicated to defend against parasites.
Which random factors caused the frostwing snippers to die out? Them migrating out? Competitors or predators migrating in? Or is there some chance of not getting the seed, even if they’re the only species left? I didn’t get a good look at the source code, but I thought things were fairly deterministic once only one species was left.
In most formulations, the five people are on the track ahead, not in the trolley.
I took a look at the course you mentioned:
It looks like I got some of the answers wrong.
Where am I?
In the trolley. You, personally, are not in immediate danger.
Who am I?
A trolley driver.
Who’s in the trolley?
You are. No one in the trolley is in danger.
Who’s on the tracks?
Five workers ahead, one to the right.
Do I work for the trolley company?
Yes.
The problem was not as poorly specified as you implied it to be.
What year is it?
Current year.
Where am I?
Near a trolley track.
Who am I?
Yourself.
Who’s in the trolley?
You don’t know.
Who’s on the tracks?
You don’t know.
Who designed the trolley?
You don’t know.
Who is responsible for the brake failure?
You don’t know.
Do I work for the trolley company?
Assume that you’re the only person who can pull the lever in time, and it wouldn’t be difficult or costly for you to do so. If your answer still depends on whether or not you work for the trolley company, you are different from most people, and should explain both cases explicitly.
If so, what are its standard operating procedures for this situation?
Either there are none, or you’re actually not in the situation above, but creating those procedures right now.
What would my family think?
I don’t know, maybe you have an idea.
Would either decision affect my future job prospects?
No.
Is there a way for me to fix the systemic problem of trolleys crashing in thought experiments?
Maybe, but not before the trolley crashes.
Can I film the crash and post the video online?
Yes.
If Scarlet pressed the PANIC button then she would receive psychiatric counseling, three months mandatory vacation, optional retirement at full salary and disqualification for life from the most elite investigative force in the system.
This sounds familiar, but some quick searching didn’t bring anything up. Is it a reference to something?
Did you consider withdrawal effects at all? A day without caffeine having used it the previous few days is going to be very different from one where you haven’t used caffeine in months.