The regrowing limb thing is a nonstarter due to the issue of time if I understand correctly. Salamanders that can regrow limbs take roughly the same amount of time to regrow them as the limb takes to grow in the first place. So it would be 1-2 decades before the limb was of adult size. Secondly it’s not as simple as just smearing on some stem cells to an arm stump. Limbs form because of specific signalling molecules in specific gradients. I don’t think these are present in an adult body once the limb is made. So you’d need a socket which produces those which you’d have to build in the lab, attach to blood supply to feed the limb, etc.
J Bostock
My model: suppose we have a DeepDreamer-style architecture, where (given a history of sensory inputs) the babbler module produces a distribution over actions, a world model predicts subsequent sensory inputs, and an evaluator predicts expected future X. If we run a tree-search over some weighted combination of the X, Y, and Z maximizers’ predicted actions, then run each of the X, Y, and Z maximizers’ evaluators, we’d get a reasonable approximation of a weighted maximizers.
This wouldn’t be true if we gave negative weights to the maximizers, because while the evaluator module would still make sense, the action distributions we’d get would probably be incoherent e.g. the model just running into walls or jumping off cliffs.
My conjecture is that, if a large black box model is doing something like modelling X, Y, and Z maximizers acting in the world, that large black box model might be close in model-space to a itself being a maximizer which maximizes 0.3X + 0.6Y + 0.1Z, but it’s far in model-space from being a maximizer which maximizes 0.3X − 0.6Y − 0.1Z due to the above problem.
Seems like if you’re working with neural networks there’s not a simple map from an efficient (in terms of program size, working memory, and speed) optimizer which maximizes X to an equivalent optimizer which maximizes -X. If we consider that an efficient optimizer does something like tree search, then it would be easy to flip the sign of the node-evaluating “prune” module. But the “babble” module is likely to select promising actions based on a big bag of heuristics which aren’t easily flipped. Moreover, flipping a heuristic which upweights a small subset of outputs which lead to X doesn’t lead to a new heuristic which upweights a small subset of outputs which lead to -X. Generalizing, this means that if you have access to maximizers for X, Y, Z, you can easily construct a maximizer for e.g. 0.3X+0.6Y+0.1Z but it would be non-trivial to construct a maximizer for 0.2X-0.5Y-0.3Z. This might mean that a certain class of mesa-optimizers (those which arise spontaneously as a result of training an AI to predict the behaviour of other optimizers) are likely to lie within a fairly narrow range of utility functions.
Perhaps fine-tuning needs to “delete” and replace these outdated representations related to user / assistant interactions.
It could also be that the finetuning causes this feature to be active 100% of the time, and which point it no longer correlates with the corresponding pretrained model feature, and it would just get folded into the decoder bias (to minimize L1 of fired features).
Some people struggle with the specific tactical task of navigating any conversational territory. I’ve certainly had a lot of experiences where people just drop the ball leaving me to repeatedly ask questions. So improving free-association skill is certainly useful for them.
Unfortunately, your problem is most likely that you’re talking to boring people (so as to avoid doing any moral value judgements I’ll make clear that I mean johnswentworth::boring people).
There are specific skills to elicit more interesting answers to questions you ask. One I’ve heard is “make a beeline for the edge of what this person has ever been asked before” which you can usually reach in 2-3 good questions. At that point they’re forced to be spontaneous, and I find that once forced, most people have the capability to be a lot more interesting than they are when pulling cached answers.
This is easiest when you can latch onto a topic you’re interested in, because then it’s easy on your part to come up with meaningful questions. If you can’t find any topics like this then re-read paragraph 2.
Rob Miles also makes the point that if you expect people to accurately model the incoming doom, you should have a low p(doom). At the very least, worlds in which humanity is switched-on enough (and the AI takeover is slow enough) for both markets to crash and the world to have enough social order for your bet to come through are much more likely to survive. If enough people are selling assets to buy cocaine for the market to crash, either the AI takeover is remarkably slow indeed (comparable to a normal human-human war) or public opinion is so doomy pre-takeover that there would be enough political will to “assertively” shut down the datacenters.
Also, in this case you want to actually spend the money before the world ends. So actually losing money on interests payments isn’t the real problem, the real problem is that if you actually enjoy the money you risk losing everything and being bankrupt/in debtors prison for the last two years before the world ends. There’s almost no situation in which you can be so sure of not needing to pay the money back that you can actually spend it risk-free. I think the riskiest short-ish thing that is even remotely reasonable is taking out a 30-year mortgage and paying just the minimum amount each year, such that the balance never decreases. Worst case you just end up with no house after 30 years, but not in crippling debt, and move back into the nearest rat group house.
[Question] What is the alpha in one bit of evidence?
“Optimization target” is itself a concept which needs deconfusing/operationalizing. For a certain definition of optimization and impact, I’ve found that the optimization is mostly correlated with reward, but that the learned policy will typically have more impact on the world/optimize the world more than is strictly necessary to achieve a given amount of reward.
This uses an empirical metric of impact/optimization which may or may not correlate well with algorithm-level measures of optimization targets.
Another approach would be to use per-token decoder bias as seen in some previous work: https://www.lesswrong.com/posts/P8qLZco6Zq8LaLHe9/tokenized-saes-infusing-per-token-biases But this would only solve it when the absorbing feature is a token. If it’s more abstract then this wouldn’t work as well.
Semi-relatedly, since most (all) of the SAE work since the original paper has gone into untied encoded/decoder weights, we don’t really know whether modern SAE architectures like Jump ReLU or TopK suffer as large of a performance hit as the original SAEs do, especially with the gains from adding token biases.
Oh no! Appears they were attached to an old email address, and the code is on a hard-drive which has since been formatted. I honestly did not expect anyone to find this after so long! Sorry about that.
A paper I’m doing mech interp on used a random split when the dataset they used already has a non-random canonical split. They also validated with their test data (the dataset has a three way split) and used the original BERT architecture (sinusoidal embeddings which are added to feedforward, post-norming, no MuP) in a paper that came out in 2024. Training batch size is so small it can be 4xed and still fit on my 16GB GPU. People trying to get into ML from the science end have got no idea what they’re doing. It was published in Bioinformatics.
sellers auction several very similar lots in quick succession and then never auction again
This is also extremely common in biochem datasets. You’ll get results in groups of very similar molecules, and families of very similar protein structures. If you do a random train/test split your model will look very good but actually just be picking up on coarse features.
I think the LessWrong community and particularly the LessWrong elites are probably too skilled for these games. We need a harder game. After checking the diplomatic channel as a civilian I was pretty convinced that there were going to be no nukes fired, and I ignored the rest of the game based on that. I also think the answer “don’t nuke them” is too deeply-engrained in our collective psyche for a literal Petrov Day ritual to work like this. It’s fun as a practice of ritually-not-destroying-the-world though.
Isn’t Les Mis set in the second French Revolution (1815 according to wikipedia) not the one that led to the Reign of Terror (which was in the 1790s)?
I have an old hypothesis about this which I might finally get to see tested. The idea is that the feedforward networks of a transformer create little attractor basins. Reasoning is twofold: the QK-circuit only passes very limited information to the OV circuit as to what information is present in other streams, which introduces noise into the residual stream during attention layers. Seeing this, I guess that another reason might be due to inferring concepts from limited information:
Consider that the prompts “The German physicist with the wacky hair is called” and “General relativity was first laid out by” will both lead to “Albert Einstein”. Both of them will likely land in different parts of an attractor basin which will converge.
You can measure which parts of the network are doing the compression using differential optimization, in which we take d[OUTPUT]/d[INPUT] as normal, and compare to d[OUTPUT]/d[INPUT] when the activations of part of the network are “frozen”. Moving from one region to another you’d see a positive value while in one basin, a large negative value at the border, and then another positive value in the next region.
Yeah, I agree we need improvement. I don’t know how many people it’s important to reach, but I am willing to believe you that this will hit maybe 10%. I expect the 10% to be people with above-average impact on the future, but I don’t know what %age of people is enough.
90% is an extremely ambitious goal. I would be surprised if 90% of the population can be reliably convinced by logical arguments in general.
I’ve posted it there. Had to use a linkpost because I didn’t have an existing account there and you can’t crosspost without 100 karma (presumably to prevent spam) and you can’t funge LW karma for EAF karma.
Only after seeing the headline success vs test-time-compute figure did I bother to check it against my best estimates of how this sort of thing should scale. If we assume:
A set of questions of increasing difficulty (in this case 100), such that:
The probability of getting question correct on a given “run” is an s-curve like for constants and
The model does “runs”
If any are correct, the model finds the correct answer 100% of the time
gives a score of 20⁄100
Then, depending on ( is is uniquely defined by in this case), we get the following chance of success vs question difficulty rank curves:
Higher values of make it look like a sharper “cutoff”, i.e. more questions are correct ~100% of the time, but more are wrong ~100% of the time. Lower values of make the curve less sharp, so the easier questions are gotten wrong more often, and the harder questions are gotten right more often.
Which gives the following best-of-N sample curves, which are roughly linear in in the region between 20⁄100 and 80⁄100. The smaller the value of , the steeper the curve.
Since the headline figure spans around 2 orders of magnitude compute, the model on appears to be performing on AIMES similarly to a best-of-N sampling on the case.
If we allow the model to split the task up into subtasks (assuming this creates no overhead and each subtask’s solution can be verified independently and accurately) then we get a steeper gradient roughly proportional to , and a small amount of curvature.
Of course this is unreasonable, since this requires correctly identifying the shortest path to success with independently-verifiable subtasks. In reality, we would expect the model to use extra compute on dead-end subtasks (for example, when doing a mathematical proof, verifying a correct statement which doesn’t actually get you closer to the answer, or when coding, correctly writing a function which is not actually useful for the final product) so performance scaling from breaking up a task will almost certainly be a bit worse than this.
Whether or not the model is literally doing best-of-N sampling at inference time (probably it’s doing something at least a bit more complex) it seems like it scales similarly to best-of-N under these conditions.
For a good few years you’d have a tiny baby limb, which would make it impossible to have a normal prosthetic. I also think most people just don’t want a tiny baby limb attached to them. I don’t think growing it in the lab for a decade is feasible for a variety of reasons. I also don’t know how they planned to wire the nervous system in, or ensure the bone sockets attach properly, or connect the right blood vessels. The challenge is just immense and it gets less and less worth over time it as trauma surgery and prosthetics improve.