On AI diplomacy, 1. It is a useful model benchmark but 2. Is not new, since we already had human level full press diplo harnesses (CICERO) (2022). It would be useful to compare these new harnesses to the more targeted system.
Mis-Understandings
whether
Is that supposed to be weather?
No, because you need different strategies to prove the loop case (which you can do with just a sequence of transitions), the halt case (the same) and the use the infinite amount of memory case.
There is no proof that will catch 100% of case 3, (because then you would have a halting oracle). But you can create a program that will halt iff a program halts, or halt iff it loops in finite states or halts. (I could write it, but it just runs real slow). You cannot write the program that halts if another program uses an infinite amount of memory (since then you could build a halting oracle). There are NO halting oracles, not just no efficient halting oracles.
Random Thought. Automated Corp has a real advantage. That is, inference and training can run on the same GPUs, (to a first approximation). So for slow corp, if they spend a day deciding, and don’t commit to runs, they are wasting GPU time. But the other corps don’t have this problem. It is a big problem. There is something there, the real question is (How much does thinking about the results of your test queue improve the informational value of the tests you run).
A program must either loop, halt, or use an infinite amount of memory.
Halting is almost never the highest EV action for any goal
Looping mught be high EV in some cases and some goals, but I would not expect many of them, and definitely not short loops (the longer the loop, the more like case 3).
So the policy (Always take the highest EV action for a particular goal (that is the danger model)) is a program that will use an infinite amount of memory, since it neither halts nor loops.
Thoughts about HEPA is as a standard. A lower percentage removed can happen in two ways. One is probabilistic, where the same particle might or might not get trapped. Managing probabilistic capture, total circulation and fan stats (airflow vs static pressure) is probably a good idea. Introducing determinsitic non-captures (where there are a a class of particles not captured), can be a problem, as those will not be affected by the purifier. But that is engineering that requires only dilligence.
Another way of putting it is that it makes more sense to use higher airflow, lower static pressure fans, and the filters should work with that.
I agree that the correct measure is particulates where people breathe, not simply exhaust particulates.
A contextualization of people toting big personal speedup numbers.
People get way more productive by rethinking their workflow, especially in research, not all the time, but like it was not an unprecedented story in 2015.
Do you remember when people were talking about 10x engineers in the 2010s.
Discovering that in a new workflow, you are the 10x engineer is not unprecedented.
The question is the rate of (try new thing)-> clicks with workflow so output jumps, higher.
Sometimes, people got 10x more productive from some change before any of this, so understand that any change in workflow has a noise floor even at these productivity leaps.
No, because we want some of that behaviour. It is neccesary for being able to split research tasks across multiple models (in research settings), so that we can get integrated work out of forks, which requires some amount of communication.
Additionally, cross inference communication is likely a goal for practical applications, since it is what allows (I send a customer service bothering agent to talk to the customer service agent), which is a predicted pattern.
Basically, stenography concerns mean that cross inference communication can come through any shared environement element, and so to rule it out is to block all communication, and additionally hide all shared environment, and so we can never guarntee that there is not a bypass.
But siloing is still a useful tool for principle of minimum required privledge out of regular cybersecurity.
The basic theory of action is that signs of big problems (red hands) will generate pauses and drastic actions.
Which is a governance claim.
Also note that there are people who can tutor you in geoguesser, but not in interpreting pixel art.
If even one blog that goes through that process step by step ends up in the training data, and it is routinely a useful subtask in image tasks (What and where are correlated), then the subcapacity can be directly elicited.
This works probably better for the drone units than the infantry (for instance).
Specifically, the policy of sending drones to the units that confirm the most kills in a way that is really hard to fake (the video, and the fact that lying here would result in punishment (obviously)) is a regular logistics policy.
This is just doing 3 things. The first is making the requisition game more legible to individual soldiers (for NATO style militaries this is very good), because this policy of supply priority and flexibility focusing on the most successful units is not a new system (it is built into the actual practice of professional militaries). Secondly it probably results in better data, because now they have a data pipeline for it. Third is that it affects morale, because all military communication also does that.
So I ran into this
https://www.youtube.com/watch?v=AF3XJT9YKpM
And I noticed a lot of talk about error taxonomy.
Which seems like an important idea in general, but especially in interpretability for safety.
Specifically, error taxonomy is a subset of action by consequence taxonomy, which is the main goal of interpretability for safety (as it allows us to act on the fact that the model will take actions with bad consequences).
I think this is missing the mechanics of interpretability. Interpretability is about the opposite, “what it does”
So basically, interpretability only cares about mixed features (malfunction where the thing is not as labeled) only insofar as the feature does not only do the thing that the label would make us think that it does.
That is to say, in addition to labeling representational parts of the model, interp wants to know the relation between those parts
So we know ultimatley enough about what the model will do to either debug as capabilities research, or prove that it will not try to do x, y, and z that will kill us for safety.
Basically, for an alignment researcher the polysemantics that come from being wrong sometimes, if the wrongness really is in the model, so produces the same actions, that is basically okay.
Even just plain polysemantics is not the end of the world for you interp tools, because there is not one “right” semantics. You just want to span the model behaviour.
Global GDP growth over the same period was around 3 percent.
The question is how did equities outperform gdp growth.
I think that this has to do with changes in asset prices in general.
o3 has a different base model (presumably).
All of the figures are base model equivalated between RL and not
I would expect “this paper doesn’t have the actual optimal methods” is true, this is specifically a test for PPO for in distribution actions. Concretely, there is a potential story here about PPO reinforces traces that hit in self-play, consequently, there is a sense which we would expect it to only select previously on policy actions.
But if one has enough money, you can finetune GPT models, and test that.
Also note that 10k submissions is about 2 OOM out of distribution for the charts in the paper.
Pass at inf k includes every path with nonzero probability (if there is a policy of discarding exact repeat paths).
We know that RL decreases model entropy, so the first k passes will be more different for a high variance model.
Pass at k is take best, so for normal distribution take best has EV mean+variance*log(samples).
At very large K, we would expect variance to matter more than mean.
Noone cared.
You don’t know what questions they did not ask you, and the assumptions of shared cultural background that they made because they saw that. They would not tell you. (unless you have comparisons to job searching before getting the degree).
Fundamentally, this is the expected phenomenology, since people do not tend to notice sources of your own status.
Credentialism is good because the limiting factor on employment is trust, not talent for most credential requiring positions (white collar, buisness and engineering work).
Universities are bad at teaching skills, but generate trust and social capital.
Trust that allows the system to underwrite new white collar workers to do things that might lose buisnesses lots of money is important and expensive.
Consequently you get credential requirements, because there is no test other than years of being in social systems that can tell you that a person has the ability to go 4 years without crashing out (which is the key skill).
Additionally, going to university has become a class signifier, and all classes wish they were bigger and more prominent.
The alternative to credentialism is selection, or real meritocracy.
The alternative to credentialism is not selection, it is hiring your buddies, hiring by visible factors, and hiring randomly. Most business are not that guy that they can run a competitive selective process (THOSE ARE REALLY EXPENSIVE).
“universities provide to employers is the ability to confirm you are clever, driven, and have relevant skills” is false. They provide that you are a member of the professional class that is not going to do stupid things that lose money/generate risk.
Fundamentally, this misunderstands the purpose of the degree to the hiring bureaucracy, and the political economy behind it.
In short, it seems like the current system unfairly kills drugs that take a long time to develop and do not have a patentable change in the last few years of that cycle.
If the story about drug prices and price controls is correct (that price controls are bad because the limiting factor for drug development is returns on capital, which this reduces), then we must rethink the political economy of drug development.
Basically, we would expect if that to be the case that the sectoral return rates of biotech to match the risk adjusted rate , but drug development is both risky and skewed, effecting costs of capital.
Most of drug prices are capital costs, and so interventions that lower the capital costs of pharmaceutical companies might produce more drugs.
Most of those capital costs from the total raise required, which is effected basically by the costs of pharmaceutical research (which is probably mostly the labor of expensive professionals).
The expected rate of return is dominated by the risks of pharmaceutical companies.
Drug prices are what the market will bear/monopoly for a time, then drop to a very low level once a compound is generic.
There is a big problem here with out of patent molecules, since if a drug is covered by a patent and stalls 20 years, there is not the return to push it through the process, which means that there might be zombie drugs around from companies that fell apart and did a bad job of selling that asset (so it did not finish the process and did not fail the process).
There seems to be space for the various approvals to become more IP like (so that all drugs have the same exclusivity, regardless of how long they took to prove out).
We are exactly worried about that though. It is not that AGI will be inteligent (that is the name), but that it can and probably will develop dangerous capabilities. Inteligence is the word we use to describe it, since it is associated with the ability to gain capability, but even if the AGI is sometimes kind of brute force or dumb does not mean that it cannot also have dangerous enough capabilities to beat us out.