Yes, you are missing something.
Any DEADCODE that can be added to a 1kb program can also be added to a 2kb program. The net effect is a wash, and you will end up with a ratio over priors
Yes, you are missing something.
Any DEADCODE that can be added to a 1kb program can also be added to a 2kb program. The net effect is a wash, and you will end up with a ratio over priors
Thirder here (with acknowledgement that the real answer is to taboo ‘probability’ and figure out why we actually care)
The subjective indistinguishability of the two Tails wakeups is not a counterargument - it’s part of the basic premise of the problem. If the two wakeups were distinguishable, being a halfer would be the right answer (for the first wakeup).
Your simplified example/analogies really depend on that fact of distinguishability. Since you didn’t specify whether or not you have it in your examples, it would change the payoff structure.
I’ll also note you are being a little loose with your notion of ‘payoff’. You are calculating the payoff for the entire experiment, whereas I define the ‘payoff’ as being the odds being offered at each wakeup. (since there’s no rule saying that Beauty has to bet the same each time!)
To be concise, here’s my overall rationale:
Upon each (indistinguishable) wakeup, you are given the following offer:
If you bet H and win, you get dollars.
If you bet T and win, you get 1+ dollars.
If you believe T yields a higher EV, then you have a credence
You get a positive EV for all N up to 2, so . Thus you should be a thirder.
Here’s a clarifying example where this interpretation becomes more useful than yours:
The experimenter flips a second coin. If the second coin is Heads (H2), then N= 1.50 on Monday and 2.50 on Tuesday. If the second coin is Tails, then the order is reversed.
I’ll maximize my EV if I bet T when , and H when . Both of these fall cleanly out of ‘thirder’ logic.
What’s the ‘halfer’ story here? Your earlier logic doesn’t allow for separate bets on each awakening.
Thanks for sharing that study. It looks like your team is already well-versed in this subject!
You wouldn’t want something that’s too hard to extract, but I think restricting yourself to a single encoder layer is too conservative—LLMs don’t have to be able to fully extract the information from a layer in a single step.
I’d be curious to see how much closer a two-layer encoder would get to the ITO results.
:Here’s my longer reply.
I’m extremely excited by the work in SAEs and their potential for interpretability, however I think there is a subtle misalignment in the SAE architecture and loss function, and the actual desired objective function.
The SAE loss function is:
, where is the -Norm.
or
I would argue that, however, what you are actually trying to solve is the sparse coding problem:
where, importantly, the inner optimization is solved separately (including at runtime).
Since is an overcomplete basis, finding that minimizes the inner loop (also known as basis pursuit denoising[1] ) is a notoriously challenging problem, one which a single-layer encoder is underpowered to compute. The SAE’s encoder thus introduces a significant error , which means that you are actual loss function is:
The magnitude of the errors would have to be determined empirically, but I suspect that it is enough to be a significant source of error..
There are a few things you could do reduce the error:
Ensuring that obeys the restricted isometry property[2] (i.e. a cap on the cosine similarity of decoder weights), or barring that, adding a term to your loss function that at least minimizes the cosine similarities.
Adding extra layers to your encoder, so it’s better at solving for .
Empirical studies to see how large the feature error is / how much reconstruction error it is adding.
This is great work. My recommendation: add a term in your loss function that penalizes features with high cosine similarity.
I think there is a strong theoretical underpinning for the results you are seeing.
I might try to reach out directly—some of my own academic work is directly relevant here.
This is one of those cases where it might be useful to list out all the pros and cons of taking the 8 courses in question, and then thinking hard about which benefits could be achieved by other means.
Key benefits of taking a course (vs. Independent study) beyond the signaling effect might include:
precommitting to learning a certain body of knowledge
curation of that body of knowledge by an experienced third party
additional learning and insight from partnerships / teamwork / office hours
But these depend on the courses and your personality. The precommitment might be unnecessary due to your personal work habits, the curation might be misaligned with what you are interested in learning, and the other students or TAs may not have useful insights that you can’t figure out in your own.
Hope that helps.
Instead of demanding orthogonal representations, just have them obey the restricted isometry property.
Basically, instead of requiring , we just require .
This would allow a polynomial number of sparse shards while still allowing full recovery.
I think the success or failure of this model really depends on the nature and number of the factions. If interfactional competition gets too zero-sum (this might help us, but it helps them more, so we’ll oppose it) then this just turns into stasis.
During ordinary times, vetocracy might be tolerable, but it will slowly degrade state capacity. During a crisis it can be fatal.
Even in America, we only see this factional veto in play in a subset of scenarios—legislation under divided government. Plenty of action at the executive level or in state governments don’t have to worry about this.
You switch positions throughout the essay, sometimes in the same sentence!
“Completely remove efficacy testing requirements” (Motte) ”… making the FDA a non-binding consumer protection and labeling agency” (Bailey)
“Restrict the FDA’s mandatory authority to labeling” logically implies they can’t regulate drug safety, and can’t order recalls of dangerous products. Bailey! ”… and make their efficacy testing completely non-binding” back to Motte again.
“Pharmaceutical manufactures can go through the FDA testing process and get the official “approved’ label if insurers, doctors, or patients demand it, but its not necessary to sell their treatment.” Again implies the FDA has no safety regulatory powers.
“Scott’s proposal is reasonable and would be an improvement over the status quo, but it’s not better than the more hardline proposal to strip the FDA of its regulatory powers.” Bailey again!
This is a Motte and Bailey argument.
The Motte is ‘remove the FDAs ability to regulate drugs for efficacy’
The Bailey is ‘remove the FDAs ability to regulate drugs at all’
The FDA doesn’t just regulate drugs for efficacy, it regulates them for safety too. This undercuts your arguments about off-label prescriptions, which were still approved for use by the FDA as safe.
Relatedly, I’ll note you did not address Scott’s point on factory safety.
If you actually want to make the hardline position convincing, you need to clearly state and defend that the FDA should not regulate drugs for safety.
The differentiation between CDT as a decision theory and FDT as a policy theory is very helpful at dispelling confusion. Well done.
However, why do you consider EDT a policy theory? It’s just picking actions with the highest conditional utility. It does not model a ‘policy’ in the optimization equation.
Also, the ladder analogy here is unintuitive.
This doesn’t make sense to me. Why am I not allowed to update on still being in the game?
I noticed that in your problem setup you deliberately removed n=6 from being in the prior distribution. That feels like cheating to me—it seems like a perfectly valid hypothesis.
After seeing the first chamber come up empty, that should definitively update me away from n=6. Why can’t I update away from n=5 ?
Counterpoint, robotaxis already exist: https://www.nytimes.com/2023/08/10/technology/driverless-cars-san-francisco.html
You should probably update your priors.
Nope.
According to the CDC pulse survey you linked (https://www.cdc.gov/nchs/covid19/pulse/long-covid.htm) the metrics for long covid are trending down. This includes: currently experiencing, any limitations, and significant limitations categories.
How is this in the wrong place?
Nice. This also matches my earlier observation that the epestemic failure is of not anticipating one’s change in value. If you do anticipate it, you won’t agree to this money pump.
I agree that the type of rationalization you’ve described is often practically rational. And it’s at most a minor crime against epestemic rationality. If anything, the epestemic crime here is not anticipating that your preferences will change after you’ve made a choice.
However, I don’t think this case is what people have in mind when they critique rationalization.
The more central case is when we rationalize decisions that affect other people; for example, Alice might make a decision that maximizes her preferences and disregards Bob’s, but after the fact she’ll invent reasons that make her decision appear less callous: “I thought Bob would want me to do it!”
While this behavior might be practically rational from Alice’s selfish perspective, she’s being epestemically unvirtuous by lying to Bob, degrading his ability to predict her future behavior.
Maybe you can use specific terminology to differentiate your case from the more central one, maybe “preference rationalization”?
I can use a laptop to hammer in a nail, but it’s probably not the fastest or most reliable way to do so.
I don’t see how this is more of a risk for a shutdown-seeking goal, than it is for any other utility function that depends on human behavior.
If anything, the right move here is for humans to commit to immediately complying with plausible threats from the shutdown-seeking AI (by shutting it down). Sure, this destroys the immediate utility of the AI, but on the other hand it drives a very beneficial higher level dynamic, pushing towards better and better alignment over time.
There are, but what does having a length below 10^90 have to do with the solomonoff prior? There’s no upper bound on the length of programs.