twitter is great because it boils down saying funny things to purely a problem of optimizing for funniness, and letting twitter handle the logistics of discovery and distribution. being e.g a comedian is a lot more work.
leogao
corollary: oftentimes, when smart people say things that are clearly wrong, what’s really going on is they’re saying the closest thing in their frame that captures the grain of truth
the world is too big and confusing, so to get anything done (and to stay sane) you have to adopt a frame. each frame abstracts away a ton about the world, out of necessity. every frame is wrong, but some are useful. a frame comes with a set of beliefs about the world and a mechanism for updating those beliefs.
some frames contain within them the ability to become more correct without needing to discard the frame entirely; they are calibrated about and admit what they don’t know. they change gradually as we learn more. other frames work empirically but are a dead end epistemologically because they aren’t willing to admit some of their false claims. for example, many woo frames capture a grain of truth that works empirically, but come with a flawed epistemology that prevents them from generating novel and true insights.
often it is better to be confined inside a well trodden frame than to be fully unconstrained. the space of all possible actions is huge, and many of them are terrible. on the other hand, staying inside well trodden frames forever substantially limits the possibility of doing something extremely novel
it’s (sometimes) also a mechanism for seeking domains with long positive tail outcomes, rather than low variance domains
the financial industry is a machine that lets you transmute a dollar into a reliable stream of ~4 cents a year ~forever (or vice versa). also, it gives you a risk knob you can turn that increases the expected value of the stream, but also the variance (or vice versa; you can take your risky stream and pay the financial industry to convert it into a reliable stream or lump sum)
I think the most important part of paying for goods and services is often not the raw time saved, but the cognitive overhead avoided. for instance, I’d pay much more to avoid having to spend 15 minutes understanding something complicated (assuming there is no learning value) than 15 minutes waiting. so it’s plausibly more costly to have to figure out the timetable, fare system, remembering to transfer, navigating the station, than the additional time spent in transit (especially applicable in a new unfamiliar city)
agree it goes in both directions. time when you hold critical context is worth more than time when you don’t. it’s probably at least sometimes a good strategy to alternate between working much more than sustainable and then recovering.
my main point is this is a very different style of reasoning than what people usually do when they talk about how much their time is worth.
people around these parts often take their salary and divide it by their working hours to figure out how much to value their time. but I think this actually doesn’t make that much sense (at least for research work), and often leads to bad decision making.
time is extremely non fungible; some time is a lot more valuable than other time. further, the relation of amount of time worked to amount earned/value produced is extremely nonlinear (sharp diminishing returns). a lot of value is produced in short flashes of insight that you can’t just get more of by spending more time trying to get insight (but rather require other inputs like life experience/good conversations/mentorship/happiness). resting or having fun can help improve your mental health, which is especially important for positive tail outcomes.
given that the assumptions of fungibility and linearity are extremely violated, I think it makes about as much sense as dividing salary by number of keystrokes or number of slack messages.
concretely, one might forgo doing something fun because it seems like the opportunity cost is very high, but actually diminishing returns means one more hour on the margin is much less valuable than the average implies, and having fun improves productivity in ways not accounted for when just considering the intrinsic value one places on fun.
I’d be surprised if this were the case. next neurips I can survey some non native English speakers to see how many ML terms they know in English vs in their native language. I’m confident in my ability to administer this experiment on Chinese, French, and German speakers, which won’t be an unbiased sample of non-native speakers, but hopefully still provides some signal.
only 2 people walked away without answering (after saying yes initially); they were not counted as yes or no. another several people refused to even answer, but this was also quite rare. the no responders seemed genuinely confused, as opposed to dismissive.
feel free to replicate this experiment at ICML or ICLR or next neurips.
not sure, i didn’t keep track of this info. an important data point is that because essentially all ML literature is in english, non-anglophones generally either use english for all technical things, or at least codeswitch english terms into their native language. for example, i’d bet almost all chinese ML researchers would be familiar with the term CNN and it would be comparatively rare for people to say 卷积神经网络. (some more common terms like 神经网络 or 模型 are used instead of their english counterparts—neural network / model—but i’d be shocked if people didn’t know the english translations)
overall i’d be extremely surprised if there were a lot of people who knew conceptually the idea of AGI but didn’t know that it was called AGI in english
the specific thing i said to people was something like:
excuse me, can i ask you a question to help settle a bet? do you know what AGI stands for? [if they say yes] what does it stand for? [...] cool thanks for your time
i was careful not to say “what does AGI mean”.
most people who didn’t know just said “no” and didn’t try to guess. a few said something like “artificial generative intelligence”. one said “amazon general intelligence” (??). the people who answered incorrectly were obviously guessing / didn’t seem very confident in the answer.
if they seemed confused by the question, i would often repeat and say something like “the acronym AGI” or something.
several people said yes but then started walking away the moment i asked what it stood for. this was kind of confusing and i didn’t count those people.
I decided to conduct an experiment at neurips this year: I randomly surveyed people walking around in the conference hall to ask whether they had heard of AGI
I found that out of 38 respondents, only 24 could tell me what AGI stands for (63%)
we live in a bubble
I’m very excited about approaches to add hierarchy to SAEs—seems like an important step forward. In general, approaches that constraint latents in various ways that let us have higher L0 without reconstruction becoming trivial seem exciting.
I think it would be cool to get follow up work on bigger LMs. It should also be possible to do matryoshka with block size = 1 efficiently with some kernel tricks, which would be cool.
I won’t claim to be immune to peer pressure but at least on the epistemic front I think I have a pretty legible track record of believing things that are not very popular in the environments I’ve been in.
a medium with less limitations is strictly better for making good art, but it’s also harder to identify good art among the sea of bad art because the medium alone is no longer as good a signal of quality
to be clear, a “winter/slowdown” in my typology is more about the vibes and could only be a few years counterfactual slowdown. like the dot-com crash didn’t take that long for companies like Amazon or Google to recover from, but it was still a huge vibe shift
also to further clarify this is not an update I’ve made recently, I’m just making this post now as a regular reminder of my beliefs because it seems good to have had records of this kind of thing (though everyone who has heard me ramble about this irl can confirm I’ve believed sometime like this for a while now)
people often say that limitations of an artistic medium breed creativity. part of this could be the fact that when it is costly to do things, the only things done will be higher effort
simple ideas often require tremendous amounts of effort to make work.