fun side project idea: create a matrix X and accompanying QR decomposition, such that X and Q are both valid QR codes that link to the wikipedia page about QR decomposition
leogao
idea: flight insurance, where you pay a fixed amount for the assurance that you will definitely get to your destination on time. e.g if your flight gets delayed, they will pay for a ticket on the next flight from some other airline, or directly approach people on the next flight to buy a ticket off of them, or charter a private plane.
pure insurance for things you could afford to self insure is generally a scam (and the customer base of this product could probably afford to self insure) but this mostly provides value by handling the rather complicated logistics for you rather than by reducing the financial burden, and there are substantial benefits from economies of scale (e.g if you have enough customers you can maintain a fleet of private planes within a few hours of most major airports)
I think “refactor less” is bad advice for substantial shared infrastructure. It’s good advice only for your personal experiment code.
Actual full blown fraud in frontier models at the big labs (oai/anthro/gdm) seems very unlikely. Accidental contamination is a lot more plausible but people are incentivized to find metrics that avoid this. Evals not measuring real world usefulness is the obvious culprit imo and it’s one big reason my timelines have been somewhat longer despite rapid progress on evals.
Several people have spent hundreds of dollars betting yes, which is a lot of money to spend for the memes.
there are a nontrivial number of people who would regularly spend a few hundred dollars for the memes.
made an estimate of the distribution of prices of the SPX in one year by looking at SPX options prices, smoothing the implied volatilities and using Breeden-Litzenberger.
(not financial advice etc, just a fun side project)
solution 3 is to be an iconoclast and to feel comfortable pushing against the flow and to try to prove everyone else wrong.
timelines takes
i’ve become more skeptical of rsi over time. here’s my current best guess at what happens as we automate ai research.
for the next several years, ai will provide a bigger and bigger efficiency multiplier to the workflow of a human ai researcher.
ai assistants will probably not uniformly make researchers faster across the board, but rather make certain kinds of things way faster and other kinds of things only a little bit faster.
in fact probably it will make some things 100x faster, a lot of things 2x faster, and then be literally useless for a lot of remaining things
amdahl’s law tells us that we will mostly be bottlenecked on the things that don’t get sped up a ton. like if the thing that got sped up 100x was only 10% of the original thing, then you don’t get more than a 1/(1 − 10%) speedup.
i think the speedup is a bit more than amdahl’s law implies. task X took up 10% of the time because there is diminishing returns to doing more X, and so you’d ideally do exactly the amount of X such that the marginal value of time spent on X is exactly in equilibrium with time spent on anything else. if you suddenly decrease the cost of X substantially, the equilibrium point shifts towards doing more X.
in other words, if AI makes lit review really cheap, you probably want to do a much more thorough lit review than you otherwise would have, rather than just doing the same amount of lit review but cheaper.
at the first moment that ai can fully replace a human researcher (that is, you can purely just put more compute in and get more research out, and only negligible human labor is required), the ai will probably be more expensive per unit of research than the human
(things get a little bit weird because my guess is before ai can drop-in replace a human, we will reach a point where adding ai assistance equivalent to the cost of 100 humans to 2025-era openai research would be equally as good as adding 100 humans, but the ai’s are not doing the same things as the humans, and if you just keep adding ai’s you start experiencing diminishing returns faster than with adding humans. i think my analysis still mostly holds despite this)
naively, this means that the first moment that AIs can fully automate AI research at human-cost is not a special criticality threshold. if you are at equilibrium for allocating money between researchers and compute, then suddenly having the ability to convert compute into researchers at the exchange rate of the salary of a human researcher doesn’t really make sense
in reality, you will probably not be at equilibrium, because there are a lot of inefficiencies in hiring humans—recruiting is a lemon market, you have to onboard new hires relatively slowly, management capacity is limited, there is a inelastic and inefficient supply of qualified hires, etc. but i claim this is a relatively small effect and can’t explain a one OOM increase in workforce size
also: anyone who has worked in a large organization knows that team size is not everything. having too many people can often even be a liability and slow you down. even when it doesn’t, adding more people almost never makes your team linearly more productive.
however, if AIs have much better scaling laws with additional parallel compute than human organizations do, then this could change things a lot. this is one of my biggest uncertainties here and one reason i still take rsi seriously.
your AIs might higher have bandwidth communication with each other than your humans do. but also maybe they might be worse at generalizing previous findings to new situations or something.
they might be more aligned with doing lots of research all day, whereas humans care about a lot of other things like money and status and fun and so on. but if outer alignment is hard we might get the AI equivalent of corporate politics.
one other thing is that compute is a necessary input to research. i’ll mostly roll this into the compute cost of actually running the AIs.
the part where AI research feeds back into how good the AIs are could be very slow in practice
there are logarithmic returns to more pretraining compute and more test time compute. so an improvement that 10xes the effective compute doesn’t actually get you that much. 4.5 isn’t that much better than 4 despite being 10x more compute (which is in turn not that much better than 3.5, I would claim).
you run out of low hanging fruit at some point. each 2x in compute efficiency is harder to find than the previous one.
i would claim that in fact much of the recent feeling that AI progress is fast is due to a lot of low hanging fruit being picked. for example, the shift from pretrained models to RL for reasoning picked a lot of low hanging fruit due to not using test time compute / not eliciting CoTs well, and we shouldn’t expect the same kind of jump consistently.
an emotional angle: exponentials can feel very slow in practice; for example, moore’s law is kind of insane when you think about it (doubling every 18 months is pretty fast), but it still takes decades to play out
my referral/vouching policy is i try my best to completely decouple my estimate of technical competence from how close a friend someone is. i have very good friends i would not write referrals for and i have written referrals for people i basically only know in a professional context. if i feel like it’s impossible for me to disentangle, i will defer to someone i trust and have them make the decision. this leads to some awkward conversations, but if someone doesn’t want to be friends with me because it won’t lead to a referral, i don’t want to be friends with them either.
Overall very excited about more work on circuit sparsity, and this is an interesting approach. I think this paper would be much more compelling if there was a clear win on some interp metric, or some compelling qualitative example, or both.
i’m happy to grant that the 0.1% is just a fermi estimate and there’s a +/- one OOM error bar around it. my point still basically stands even if it’s 1%.
i think there are also many factors in the other direction that just make it really hard to say whether 0.1% is an under or overestimate.
for example, market capitalization is generally an overestimate of value when there are very large holders. tesla is also a bit of a meme stock so it’s most likely trading above fundamental value.
my guess is most things sold to the public sector probably produce less economic value per $ than something sold to the private sector, so profit overestimates value produced
the sign on net economic value of his political advocacy seems very unclear to me. the answer depends strongly on some political beliefs that i don’t feel like arguing out right now.
it slightly complicates my analogy for elon to be both the richest person in the us and also possibly the most influential (or one of). in my comment i am mostly referring to economic-elon. you are possibly making some arguments about influentialness in general. the problem is that influentialness is harder to estimate. also, if we’re talking about influentialness in general, we don’t get to use the 0.1% ownership of economic output as a lower bound of influentialness. owning x% of economic output doesn’t automatically give you x% of influentialness. (i think the majority of other extremely rich people are not nearly as influential as elon per $)
you might expect that the butterfly effect applies to ML training. make one small change early in training and it might cascade to change the training process in huge ways.
at least in non-RL training, this intuition seems to be basically wrong. you can do some pretty crazy things to the training process without really affecting macroscopic properties of the model (e.g loss). one very well known example is that using mixed precision training results in training curves that are basically identical to full precision training, even though you’re throwing out a ton of bits of precision on every step.
there’s an obvious synthesis of great man theory and broader structural forces theories of history.
there are great people, but these people are still bound by many constraints due to structural forces. political leaders can’t just do whatever they want; they have to appease the keys of power within the country. in a democracy, the most obvious key of power is the citizens, who won’t reelect a politician that tries to act against their interests. but even in dictatorships, keeping the economy at least kind of functional is important, because when the citizens are starving, they’re more likely to revolt and overthrow the government. there are also powerful interest groups like the military and critical industries, which have substantial sway over government policy in both democracies and dictatorships. many powerful people are mostly custodians for the power of other people, in the same way that a bank is mostly a custodian for the money of its customers.
also, just because someone is involved in something important, it doesn’t mean that they were maximally counterfactually responsible. structural forces often create possibilities to become extremely influential, but only in the direction consistent with said structural force. a population that strongly believes in foobarism will probably elect a foobarist candidate, and if the winning candidate never existed, another foobarist candidate would have won. winning an election always requires a lot of competence, but no matter how competent you are, you aren’t going to win on an anti-foobar platform. the sentiment of the population has created the role of foobarist president for someone foobarist to fill.
this doesn’t mean that influential people have no latitude whatsoever to influence the world. when we’re looking at the highest tiers of human ability, the efficient market hypothesis breaks down. there are so few extremely competent people that nobody is a perfect replacement for anyone else. if someone didn’t exist, it doesn’t necessarily mean someone else would have stepped up to do the same. for example, if napoleon had never existed, there might have been some other leader who took advantage of the weakness of the Directory to seize power, but they likely would have been very different from napoleon. great people still have some latitude to change the world orthogonal to the broader structural forces.
it’s not a contradiction for the world to be mostly driven by structural forces, and simultaneously for great people to have hugely more influence than the average person. in the same way that bill gates or elon musk are vastly vastly wealthier than the median person, great people have many orders of magnitude more influence on the trajectory of history than the average person. and yet, the richest person is still only responsible for 0.1%* of the economic output of the united states.
*\ fermi estimate, taking musk’s net worth and dividing by 20 to convert stocks to flows, and comparing to gdp. caveats apply based on interest rates and gdp being a bad metric. many assumptions involved here.
there are a lot of video games (and to a lesser extent movies, books, etc) that give the player an escapist fantasy of being hypercompetent. It’s certainly an alluring promise: with only a few dozen hours of practice, you too could become a world class fighter or hacker or musician! But because becoming hypercompetent at anything is a lot of work, the game has to put its finger on the scale to deliver on this promise. Maybe flatter the user a bit, or let the player do cool things without the skill you’d actually need in real life.
It’s easy to dismiss this kind of media as inaccurate escapism that distorts people’s views of how complex these endeavors of skill really are. But it’s actually a shockingly accurate simulation of what it feels like to actually be really good at something. As they say, being competent doesn’t feel like being competent, it feels like the thing just being really easy.
when i was new to research, i wouldn’t feel motivated to run any experiment that wouldn’t make it into the paper. surely it’s much more efficient to only run the experiments that people want to see in the paper, right?
now that i’m more experienced, i mostly think of experiments as something i do to convince myself that a claim is correct. once i get to that point, actually getting the final figures for the paper is the easy part. the hard part is finding something unobvious but true. with this mental frame, it feels very reasonable to run 20 experiments for every experiment that makes it into the paper.
libraries abstract away the low level implementation details; you tell them what you want to get done and they make sure it happens. frameworks are the other way around. they abstract away the high level details; as long as you implement the low level details you’re responsible for, you can assume the entire system works as intended.
a similar divide exists in human organizations and with managing up vs down. with managing up, you abstract away the details of your work and promise to solve some specific problem. with managing down, you abstract away the mission and promise that if a specific problem is solved, it will make progress towards the mission.
(of course, it’s always best when everyone has state on everything. this is one reason why small teams are great. but if you have dozens of people, there is no way for everyone to have all the state, and so you have to do a lot of abstracting.)
when either abstraction leaks, it causes organizational problems—micromanagement, or loss of trust in leadership.
the laws of physics are quite compact. and presumably most of the complexity in a zygote is in the dna.
a thriving culture is a mark of a healthy and intellectually productive community / information ecosystem. it’s really hard to fake this. when people try, it usually comes off weird. for example, when people try to forcibly create internal company culture, it often comes off as very cringe.
don’t worry too much about doing things right the first time. if the results are very promising, the cost of having to redo it won’t hurt nearly as much as you think it will. but if you put it off because you don’t know exactly how to do it right, then you might never get around to it.
the intent is to provide the user with a sense of pride and accomplishment for unlocking different rationality methods.