Blog at thelimelike.wordpress.org
Closed Limelike Curves
We certainly are, which isn’t unique to either of us; Savage discusses them all in a single common framework on decision theory, where he develops both sets of ideas jointly. A money pump is just a Dutch book where all the bets happen to be deterministic. I chose to describe things this way because it lets me do a lot more cross-linking within Wikipedia articles on decision theory, which encourages people reading about one to check out the other.
Yes, these Wikipedia articles do have lots of mistakes. Stop writing about them here and go fix them!
The Wikipedia articles on the VNM theorem, Dutch Book arguments, money pump, Decision Theory, Rational Choice Theory, etc. are all a horrific mess. They’re also completely disjoint, without any kind of Wikiproject or wikiboxes for tying together all the articles on rational choice.
It’s worth noting that Wikipedia is the place where you—yes, you!—can actually have some kind of impact on public discourse, education, or policy. There is just no other place you can get so many views with so little barrier to entry. A typical Wikipedia article will get more hits in a day than all of your LessWrong blog posts have gotten across your entire life, unless you’re @Eliezer Yudkowsky.
I’m not sure if we actually “failed” to raise the sanity waterline, like people sometimes say, or if we just didn’t even try. Given even some very basic low-hanging fruit interventions like “write a couple good Wikipedia articles” still haven’t been done 15 years later, I’m leaning towards the latter. edit me senpai
EDIT: Discord to discuss editing here.
I’d just like to take a moment to point out that seq-PAV isn’t a voting method so much as a greedy approximation algorithm.
It might make sense for a hand count, but not otherwise; sticking PAV into your favorite mixed integer linear programming solver Pareto-dominates seq-SPAV.
Important open problems in voting
Do these systems avoid the strategic voting that plagues American elections? No. For example, both Single Transferable Vote and Condorcet voting sometimes provide incentives to rank a candidate with a greater chance of winning higher than a candidate you prefer—that is, the same “vote Gore instead of Nader” dilemma you get in traditional first-past-the-post.
Depends—whether you get that dilemma with Condorcet methods depends on how exactly you handle tied ranks. If you require a full majority (>50% of votes) to declare one candidate defeats another, you can create a Condorcet system that doesn’t have that problem.
Waaaay too late, but I come bearing news from 2024: the strategy was super easy for people to figure out. Democrats used it in the Alaska 2022 Senate race, where they intentionally sandbagged their own candidates to make sure none of them eliminated Lisa Murkowski.
In fact, the strategy of IRV is exactly the same as in FPP with a primary. (Because they’re basically the same system!) In round 1, your goal is to back the most electable candidate in your party to help them make it to the general, where they’ll win. So you need to find an electable moderate and put them at the top of your ballot. (But not too moderate, or the payoff is too low.)Alternatively, you can use the opposite strategy, called raiding: find the worst candidate from the opposite coalition and support them. Then, in the final round, your party has an unelectable extremist as their opponent.
Security mindset, people! “I didn’t manage to work out how to vote strategically in the 20 seconds I spent thinking about it” is not “tactical voting is impossible”.
Every social ranking function corresponds to a social choice function, and vice-versa, which is why they’re equivalent. The Ranking→Choice direction is trivial.
The opposite direction starts by identifying the social choice for a given ranking. Then, you delete the winner and run the same algorithm again, which gives you a runner-up (who is ranked 2nd); and so on.
Social ranking is often cleaner than working with an election algorithm because those have the annoying edge-case of tied votes, so your output is technically a set of candidates (who may be tied).
Oh, I was making a joke about timelines.
Oh wow, this is amazing work! :)
question in theoretical psephology
To clarify, this falls under social choice theory and mechanism design rather than psephology.
I mean, how sure are we about that second part?
Nevermind that; somewhere around 5% of the population would probably be willing to end all human life if they could. Too many people take the correct point that “human beings are, on average, aligned” and forget about the words “on average”.
I’m not sure what point this post is trying to make exactly. Yes, it’s function approximation; I think we all know that.
When we talk about inner and outer alignment, outer alignment is “picking the correct function to learn.” (When we say “loss,” we mean the loss on a particular task, not the abstract loss function like RMSE.)
Inner alignment is about training a model that generalizes to situations outside the training data.
(it would be convenient if yes, but this would feel surprising—otherwise you could just start a corporation, not pay your taxes the first year, dissolve it, start an identical corporation the second year, and so on.)
This (a consistent pattern of doing the same thing) would get you prosecuted, because courts are allowed to pierce the corporate veil, which is lawyer-speak for “call you out on your bullshit.” If it’s obvious that you’re creating corporations as a legal fiction to avoid taxes, the court will go after the shareholders directly (so long as the prosecution can prove the corporation exists in name only).
Because GPT-3.5 is a fine-tuned version of GPT-3, which is known to be a vanilla dense transformer.
GPT-4 is probably, in a very funny turn of events, a few dozen fine-tuned GPT-3.5 clones glued together (as a MoE).
Whether the couple is capable of having preferences probably depends on your definition of “preferences.” The more standard terminology for preferences by a group of people is “social choice function.” The main problem we run into is that social choice functions don’t behave like preferences.
One elephant in the room throughout my geometric rationality sequence, is that it is sometimes advocating for randomizing between actions, and so geometrically rational agents cannot possibly satisfy the Von Neumann–Morgenstern axioms.
It’s not just VNM; it just doesn’t even make logical sense. Probabilities are about your knowledge, not the state of the world: barring bizarre fringe cases/Cromwell’s law, I can always say that whatever I’m doing has probability 1, because I’m currently doing it, meaning it’s physically impossible to randomize your own actions. I can certainly have a probability other than 0 or 1 that I will do something, if this action depends on information I haven’t received. But as soon as I receive all the information involved in making my decision and update on it, I can’t have a 50% chance of doing something. Trying to randomize your own actions involves refusing to update on the information you have, a violation of Bayes’ theorem.
The problem is they don’t want to switch to Boston, they are happy moving to Atlanta.
In this world, the one that actually exists, Bob still wants to move to Boston. The fact that Bob made a promise and would now face additional costs associated with breaking the contract (i.e. upsetting Alice) doesn’t change the fact that he’d be happier in Boston, it just means that the contract and the action of revealing this information changed the options available. The choices are no longer “Boston” vs. “Atlanta,” they’re “Boston and upset Alice” vs. “Atlanta and don’t upset Alice.”
Moreover, holding to this contract after the information is revealed also rejects the possibility of a Pareto improvement (equivalent to a Dutch book). Say Alice and Bob agree to randomize their choice as you say. In this case, both Alice and Bob are strictly worse off than if they had agreed on an insurance policy. A contract that has Bob more than compensate Alice for the cost of moving to Boston if the California option fails would leave both of them strictly better off.
You forgot to include a sixth counterargument: you might successfully accomplish everything you set out to do, producing dozens of examples of misalignment, but as soon as you present them, everyone working on capabilities excuses them away as being “not real misalignment” for some reason or another.
I have seen more “toy examples” of misalignment than I can count (e.g. goal misgeneralization in the coin run example, deception here, and the not-so-toy example of GPT-4 failing badly as soon as it was deployed out of distribution—with the only thing needed to break it being a less-than-perfect prompt and giving it the name Sydney. We’ve successfully shown AIs can be misaligned in several ways we predicted ahead of time according to theory. Nobody cares, and nobody has used this information to advance alignment research. At this point I’ve concluded AI companies, even ones claiming otherwise, will not care until somebody dies.
Using RL(AI)F may offer a solution to all the points in this section: By starting with a set of established principles, AI can generate and revise a large number of prompts, selecting the best answers through a chain-of-thought process that adheres to these principles. Then, a reward model can be trained and the process can continue as in RLHF. This approach is potentially better than RLHF as it does not require human feedback.
I’d like to say that I fervently disagree with . Giving an unaligned AI the opportunity to modify its own weights (by categorizing its own responses to questions), then politely asking it to align itself, is quite possibly the worst alignment plan I’ve ever heard; it’s penny-wise, pound-foolish. (Assuming it even is penny-wise; I can think of several ways to generate a self-consistent AI that would cost less.)
@ProgramCrafter Link is broken, probably expired.