leogao

Karma: 5,457

leogao Apr 21, 2025, 6:45 AM
2 points
0
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform
is this for a reason other than the variance thing I mention?
I think the thing I mention is still important is because it means there is no fundamental difference between positive and negative motivation. I agree that if everything was different degrees of extreme bliss then the variance would be so high that you never learn anything in practice. but if you shift everything slightly such that some mildly unpleasant things are now mildly pleasant, I claim this will make learning a bit faster or slower but still converge to the same thing.

leogao Apr 14, 2025, 6:23 AM
12 points
0
in reply to: Richard_Ngo’s comment on: ricraz’s Shortform
i don’t think this is unique to world models. you can also think of rewards as things you move towards or away from. this is compatible with translation/scaling-invariance because if you move towards everything but move towards X even more, then in the long run you will do more of X on net, because you only have so much probability mass to go around.
i have an alternative hypothesis for why positive and negative motivation feel distinct in humans.
although the expectation of the reward gradient doesn’t change if you translate the reward, it hugely affects the variance of the gradient.^[1] in other words, if you always move towards everything, you will still eventually learn the right thing, but it will take a lot longer.
my hypothesis is that humans have some hard coded baseline for variance reduction. in the ancestral environment, the expectation of perceived reward was centered around where zero feels to be. our minds do try to adjust to changes in distribution (e.g hedonic adaptation), but it’s not perfect, and so in the current world, our baseline may be suboptimal.
1. ^
  Quick proof sketch (this is a very standard result in RL and is the motivation for advantage estimation, but still good practice to check things).
  The REINFORCE estimator is $\nabla_{θ} R = E_{τ \sim π (\cdot)} [R (τ) \nabla_{θ} log π (τ)]$ .
  WLOG, suppose we define a new reward $R^{'} (τ) = R (τ) + 1$ (and assume that $E [R] = 0$ , so $R^{'}$ is moving away from the mean).
  Then we can verify the expectation of the gradient is still the same: $\nabla_{θ} R' - \nabla_{θ} R = E_{τ \sim π (\cdot)} [\nabla_{θ} log π (τ)] = \int π (τ) \frac{\nabla_{θ} π (τ)}{π (τ)} d τ = 0$ .
  But the variance increases:
  $V_{τ \sim π (\cdot)} [R (τ) \nabla_{θ} log π (τ)] = \int R (τ)^{2} (\nabla_{θ} log π (τ))^{2} π (τ) d τ - (\nabla_{θ} R (τ))^{2}$
  $V_{τ \sim π (\cdot)} [R^{'} (τ) \nabla_{θ} log π (τ)] = \int (R (τ) + 1)^{2} (\nabla_{θ} log π (τ))^{2} π (τ) d τ - (\nabla_{θ} R (τ))^{2}$
  So:
  $V_{τ \sim π (\cdot)} [R^{'} (τ) \nabla_{θ} log π (τ)] - V_{τ \sim π (\cdot)} [R (τ) \nabla_{θ} log π (τ)] = 2 \int (R (τ) (\nabla_{θ} log π (τ))^{2} π (τ) d τ + \int (\nabla_{θ} log π (τ))^{2} π (τ) d τ$
  Obviously, both terms on the right have to be non-negative. More generally, if $E [R] = k$ , the variance increases with $O (k^{2})$ . So having your rewards be uncentered hurts a ton.

leogao Apr 13, 2025, 7:18 PM
7 points
0
on: Thoughts on the Double Impact Project
seems like an interesting idea. I had never heard of it before, and I’m generally decently aware of weird stuff like this, so they probably need to put more effort into publicity. I don’t know if truly broad public appeal will ever happen but I could imagine this being pretty popular among the kinds of people who would e.g use manifold.
one perverse incentive this scheme creates is that if you think other charity is better than political donations, you are incentivized to donate to the party with less in its pool, and you get a 1:1 match for free, at the expense of people who wanted to support a candidate.
also, in the grand scheme of things, the amount of money in politics isn’t that big, but it’s still a solid chunk. but the TAM is inherently quite limited.
https://slatestarcodex.com/2019/09/18/too-much-dark-money-in-almonds/

leogao Apr 9, 2025, 8:31 AM
10 points
6
on: leogao’s Shortform
i find it disappointing that a lot of people believe things about trading that are obviously crazy even if you only believe in a very weak form of the EMH. for example, technical analysis is obviously tea leaf reading—if it were predictive whatsoever, you could make a lot of money by exploiting it until it is no longer predictive.

leogao Apr 7, 2025, 4:13 PM
2 points
0
in reply to: Mis-Understandings’s comment on: Mis-Understandings’s Shortform
Not necessarily. Returns could be long tailed, so hypersuccess or bust could be higher EV. though you might be risk averse enough that this isn’t worth it to you.

leogao Apr 7, 2025, 8:04 AM
2 points
0
in reply to: Mis-Understandings’s comment on: Mis-Understandings’s Shortform
unless your goal is hypersuccess or bust

leogao Apr 7, 2025, 7:44 AM
5 points
3
on: leogao’s Shortform
I wonder how many supposedly consistently successful retail traders are actually just picking up pennies in front of the steamroller, and would eventually lose it all if they kept at it long enough.
also I wonder how many people have runs of very good performance interspersed by big losses, such that the overall net gains are relatively modest, but psychologically they only remember/recount the runs of good performance, whereas the losses were just bad luck and will be avoided next time.

leogao Apr 5, 2025, 12:20 AM
12 points
2
in reply to: Garrett Baker’s comment on: leogao’s Shortform
I mean, the proximate cause of the 1989 protests was the death of the quite reformist general secretary Hu Yaobang. The new general secretary, Zhao Ziyang, was very sympathetic towards the protesters and wanted to negotiate with them, but then he lost a power struggle against Li Peng and Deng Xiaoping (who was in semi retirement but still held onto control of the military). Immediately afterwards, he was removed as general secretary and martial law was declared, leading to the massacre.

leogao Apr 4, 2025, 6:24 PM
4 points
0
in reply to: Garrett Baker’s comment on: leogao’s Shortform
It’s often very costly to do so—for example, ending the zero covid policy was very politically costly even though it was the right thing to do. Also, most major reconfigurations even for autocratic countries probably mostly happen right after there is a transition of power (for China, Mao is kind of an exception, but thats because he had so much power that it was impossible to challenge his authority even when he messed up).

leogao Apr 3, 2025, 10:43 PM
9 points
2
in reply to: Arjun Panickssery’s comment on: Why Have Sentence Lengths Decreased?
goodhart

leogao Apr 3, 2025, 10:22 PM
5 points
−3
on: Why Have Sentence Lengths Decreased?
shorter sentences are better because they communicate more clearly. i used to speak in much longer and more abstract sentences, which made it harder to understand me. i think using shorter and clearer sentences has been obviously net positive for me. it even makes my thinking clearer, because you need to really deeply understand something to explain it simply.

leogao Apr 3, 2025, 10:18 PM
55 points
9
on: leogao’s Shortform
every 4 years, the US has the opportunity to completely pivot its entire policy stance on a dime. this is more politically costly to do if you’re a long-lasting autocratic leader, because it is embarrassing to contradict your previous policies. I wonder how much of a competitive advantage this is.

leogao Apr 2, 2025, 12:32 AM
36 points
0
on: LessWrong has been acquired by EA
the intent is to provide the user with a sense of pride and accomplishment for unlocking different rationality methods.

leogao Apr 1, 2025, 5:57 AM
7 points
1
on: leogao’s Shortform
fun side project idea: create a matrix X and accompanying QR decomposition, such that X and Q are both valid QR codes that link to the wikipedia page about QR decomposition

leogao Mar 28, 2025, 8:36 PM
18 points
7
on: leogao’s Shortform
idea: flight insurance, where you pay a fixed amount for the assurance that you will definitely get to your destination on time. e.g if your flight gets delayed, they will pay for a ticket on the next flight from some other airline, or directly approach people on the next flight to buy a ticket off of them, or charter a private plane.
pure insurance for things you could afford to self insure is generally a scam (and the customer base of this product could probably afford to self insure) but this mostly provides value by handling the rather complicated logistics for you rather than by reducing the financial burden, and there are substantial benefits from economies of scale (e.g if you have enough customers you can maintain a fleet of private planes within a few hours of most major airports)

leogao Mar 27, 2025, 10:36 PM
4 points
0
in reply to: Daniel Tan’s comment on: Daniel Tan’s Shortform
I think “refactor less” is bad advice for substantial shared infrastructure. It’s good advice only for your personal experiment code.

leogao Mar 26, 2025, 7:28 PM
17 points
7
on: Recent AI model progress feels mostly like bullshit
Actual full blown fraud in frontier models at the big labs (oai/anthro/gdm) seems very unlikely. Accidental contamination is a lot more plausible but people are incentivized to find metrics that avoid this. Evals not measuring real world usefulness is the obvious culprit imo and it’s one big reason my timelines have been somewhat longer despite rapid progress on evals.

leogao Mar 24, 2025, 5:41 PM
38 points
14
on: Will Jesus Christ return in an election year?
Several people have spent hundreds of dollars betting yes, which is a lot of money to spend for the memes.
there are a nontrivial number of people who would regularly spend a few hundred dollars for the memes.

leogao Mar 22, 2025, 11:01 PM
4 points
0
on: leogao’s Shortform
made an estimate of the distribution of prices of the SPX in one year by looking at SPX options prices, smoothing the implied volatilities and using Breeden-Litzenberger.
(not financial advice etc, just a fun side project)

leogao Mar 10, 2025, 3:40 PM
9 points
−2
in reply to: johnswentworth’s comment on: johnswentworth’s Shortform
solution 3 is to be an iconoclast and to feel comfortable pushing against the flow and to try to prove everyone else wrong.