My main crux about how valuable Lightcone donations are is how impactful great web dev on LessWrong is. If I look around, impact of websites doesn’t look strongly correlated with web design, expecially on the very high end. My model is more like platforms / social networks rise or fall by zeitgeist, moderation, big influencers/campaigns (eg elon musk for twitter), web design, in that order. Olli has thought about this much more than me, maybe he’s right. I certainly don’t believe there’s a good argument for LW web dev is responsible for its user metrics. Zeitgeist, moderation, and lightcone people personally posting seems likely more important to me. Lightcone is still great despite my (uninformed) disagreement!
Tao Lin
The AI generally feels as smart as a pretty junior engineer (bottom 25% of new Google junior hires)
I expect it to be more smart than that. Plausibly o3 now generally feels as smart as 60th percentile google junior hires
note: the minecraft agents people use have far greater ability to act than to sense. They have access to commands which place blocks anywhere, and pick up blocks from anywhere, even without being able to see them, eg the llm has access to
mine(blocks.wood)
command which does not require it to first locate or look at where the wood is currently. If llms played minecrafts using the human interface these misalignments would happen less
Building in california is bad for congresspeople! better to build across all 50 states like United Launch Alliance
I likely agree that anthropic-><-palantir is good, but i disagree about blocking hte US government out of AI being a viable strategy. It seems to me like many military projects get blocked by inefficient beaurocracy, and it seems plausible to me for some legacy government contractors to get exclusive deals that delay US military ai projects for 2+ years
Why would the defenders allow the tunnels to exist? Demolishing tunnels isnt expensive, if attackers prefer to attack through tunnels there likely isn’t enough incentive for defenders to not demolish tunnels
I’m often surprised how little people notice, adapt to, or even punish self deception. It’s not very hard to detect when someone’s deceiving them self, people should notice more and disincentivise that
I prefer to just think about utility, rather than probabilities. Then you can have 2 different “incentivized sleeping beauty problems”
Each time you are awakened, you bet on the coin toss, with $ payout. You get to spend this money on that day or save it for later or whatever
At the end of the experiment, you are paid money equal to what you would have made betting on your average probability you said when awoken.
In the first case, 1⁄3 maximizes your money, in the second case 1⁄2 maximizes it.
To me this implies that in real world analogues to the Sleeping Beauty problem, you need to ask whether your reward is per-awakening or per-world, and answer accordingly
I disagree a lot! Many things have gotten better! Is sufferage, abolition, democracy, property rights etc not significant? All the random stuff eg better angels of our nature claims has gotten better.
Either things have improved in the past or they haven’t, and either people trying to “steer the future” in some sense have been influential on these improvements. I think things have improved, and I think there’s definitely not strong evidence that people trying to steer the future was always useless. Because trying to steer the future is very important and motivating, i try to do it.
Yes the counterfactual impact of you individually trying to steer the future may or may not be insignificant, but people trying to steer the future is better than no one doing that!
Do these options have a chance to default / are the sellers stable enough?
A core part of Paul’s arguments is that having 1/million of your values towards humans only applies a minute amount of selection pressure against you. It could be that coordinating causes less kindness because without coordination it’s more likely some fraction of agents have small vestigial values that never got selected against or intentionally removed
to me “alignment tax” usually only refers to alignment methods that don’t cost-effectively increase capabilities, so if 90% of alignment methods did cost effectively increase capabilities but 10% did not, i would still say there was an “alignment tax”, just ignore the negatives.
Also, it’s important to consider cost-effective capabilities rather than raw capabilities—if a lab knows of a way to increase capabilities more cost-effectively than alignment, using that money for alignment is a positive alignment tax
there’s steganography, you’d need to limit total bits not accounted for by the gating system or something to remove them
yes, in some cases a much weaker (because it’s constrained to be provable) system can restrict the main ai, but in the case of llm jailbreaks there is no particular hope that such a guard system could work (eg jailbreaks where the llm answers in base64 require the guard to understand base64 and any other code the main ai could use)
interesting, this actually changed my mind, to the extent i had any beliefs about this already. I can see why you would want to update your prior, but the iterated mugging doesn’t seem like the right type of thing that should cause you to update. My intuition is to pay all the single coinflip muggings. For the digit of pi muggings, i want to consider how different this universe would be if the digit of pi was different. Even though both options are subjectively equally likely to me, one would be inconsistent with other observations or less likely or have something wrong with it, so i lean toward never paying it
Train two nets, with different architectures (both capable of achieving zero training loss and good performance on the test set), on the same data.
...
Conceptually, this sort of experiment is intended to take all the stuff one network learned, and compare it to all the stuff the other network learned. It wouldn’t yield a full pragmascope, because it wouldn’t say anything about how to factor all the stuff a network learns into individual concepts, but it would give a very well-grounded starting point for translating stuff-in-one-net into stuff-in-another-net (to first/second-order approximation).I don’t see why this experiment is good. This hessian similarity loss is only a product of the input/output behavior, and because both networks get 0 loss, their input/output behavior must be very similar, combined with general continuous optimization smoothness would lead to similar hessians. I think doing this in a case where the nets get nonzero loss (like ~all real world scenarios), would be more meaningful, because it would be similarity despite input-output behavior being non-identical and some amount of lossy compression happening.
yeah, i agree the movie has to be very high quality to work. This is a long shot, although the best rationalist novels are actually high quality which gives me some hope that someone could write a great novel/movie outline that’s more targeted at plausible ASI scenarios
it’s sad that open source models like Flux have a lot of potential for customized workflows and finetuning but few people use them
yeah. One trajectory could be someone in-community-ish writes an extremely good novel about a very realistic ASI scenario with the intention to be adaptable into a movie, it becomes moderately popular, and it’s accessible and pointed enough to do most of the guidence for the movie. I don’t know exactly who could write this book, there are a few possibilities.
Wow thank you for replying so fast! I donated $5k just now, mainly because you reminded me that lightcone may not meet goal 1 and that’s definitely worth meeting.
About web design, am only slightly persuaded by your response. In the example of Twitter, I don’t really buy that there’s public evidence that twitter’s website work besides user-invisible algorithm changes has had much impact. I only use Following page, don’t use spaces, lists, voice, or anything on twitter. Comparing twitter with bluesky/threads/whatever, really looks to me like cultural stuff, moderation, and advertisement are the meat, not the sites. Something like StackOverflow has more complexity that actually impacts website, in some way (like there is lots of implicit complexity in tweet reply trees and social groups but that only impacts website through user-invisible algorithms). And a core part of my model is that recommendation algoritms have a much lower ceiling for LessWrong because it doesn’t have enough data volume. Like I don’t expect to miss stuff i really wanted to see on LW, reading the titles of most posts isn’t hard (i also have people recommend posts in person which helps...). Maybe in my model StackOverflow is at the ceiling of web dev leveraged-ness, because there is enough volume of posts written by quality people who can be nudged to spend a little more time on quality and can be sorted through, or something (vague thought).
When I look at lesswrong, it seems extremely bottlenecked on post quality. I think having the best AIs (o3 when it comes out might help significantly) help write and improve the core content of posts might make a big difference. I would bet that interventions that don’t route through more effort/intelligence/knowledge going into writing main posts would make me like LessWrong much more.