No knowledge of prior art, but what do you mean by negative things that give people ideas? I was under impression that most of the examples people talk about involved things that were pretty limited to superhuman capabilities for the time being—self-replicating nanotech and so on. Or are you asking about something other than extinction risk, like chatbots manipulating people or something along those lines? Could you clarify?
Zane
Huh, I just started rereading a few of your posts yesterday, including this one… and by what I assume is complete coincidence, I met someone today named Oliver, who looked nothing like an Oliver. This one didn’t look like a Bill, though; I think he was more of a Nate.
This is cool! You might want to reposition the “How to use” message a little; it’s currently covering up the button that lets you add more hypotheses, so it took me a while to find it.
I would think that FDT chooses Bet 2, unless I’m misunderstanding something about the role of Peano Arithmetic here. Taking Bet 2 results in P being true, and vice versa for Bet 1; therefore, the only options that are actually possible are the bottom left and the top right.
In fact, this seems like the exact sort of situation in which FDT can be easily shown to outperform CDT. CDT would reason along the lines of “Bet 1 is better if P is true, and better if P is false, and therefore better overall” without paying attention to the direct dependency between the output of your decision algorithm and the truth value of P.
I’m not quite sure what Yudkowsky and Soares meant by “dominance” there. I’d guess on priors that they meant FDT pays attention to those dependencies when deciding whether one strategy outperforms another… but yeah, they kind of worded it in a way that suggests the opposite interpretation.
But wouldn’t what Peano is capable of proving about your specific algorithm necessarily be “downstream” of the output of that algorithm itself? The Peano axioms are upstream, yes, but what Peano proves about a particular function depends on what that function is.
I voted up on every comment in this chain on which someone stated that they voted it up, and down on every comment on this chain on which someone stated that they voted it down, removing votes when they cancelled out and using strong-votes instead when they added together. I regret to say that the comment by Dorikka seems to have had three more people say that they voted it up than that they voted it down, so although I gave it a strong upvote, I have only been able to replicate two-thirds of the original vote. I upvoted Dorikka’s last comment on another post to bring the universe back into balance.
I got alternating THTHTHTHTH… for the first 28 flips, which I would have thought would be very unlikely on priors for the 80% rule. Are you sure that’s an accurate description of the rule? It doesn’t change halfway through?
Eliezer’s example on Bayesian statistics is wr… oops!
Yeah, I discovered that part on accident at one point because I used the binomial distribution equation in a situation where it didn’t really apply, but still got the right answer.
I would think the most natural way to write a likelihood function would be to divide by the integral from 0 to 1, so that the total area under the curve is 1. That way the integral from a to b gives the probability the hypothesis assigns to receiving a result between a and b. But all that really matters is the ratios, which stay the same even without that.
[Question] What is an “anti-Occamian prior”?
That’s terrifyingly cool! I notice that they usually fall over after having completed the assigned position; are you only rewarding them being in a position at a particular point in time, after which there’s nothing left to optimize for? Are you able to make them maintain a position for longer?
[Question] Lying to chess players for alignment
Unsure about the time controls at the moment; see my response to aphyer. The advisors would be able to give the A player justification for the move they’ve recommended.
The concern that A might not be able to understand the reasoning that the advisors give them is a valid one, and that’s the whole point of the experiment! If A can’t follow the reasoning well enough to determine whether it’s good advice, then (says the analogy) people who are asking AIs how to solve alignment can’t follow their reasoning well enough to determine whether it’s good advice.
Individual positions like that could be an interesting thing to test; I’ll likely have some people try out some of those too.
I think the aspect where the deceivers have to tell the truth in many cases to avoid getting caught could make it more realistic, as in the real AI situation the best strategy might be to present a mostly coherent plan with a few fatal flaws.
Agreed that it could be a bit more realistic that way, but the main constraint here is that we need a game where there are three distinct levels of players who always beat each other. The element of luck in games like poker and backgammon makes that harder to guarantee (as suggested by the stats Joern_Stoller brought up). And another issue is that it’ll be harder to find a lot of skilled players at different levels from any game that isn’t as popular as chess is—even if we find an obscure game that would in theory be a better fit for the experiment, we won’t be able to find any Cs for it.
I’ve created a Manifold market if anyone wants to bet on what happens. If you’re playing in the experiment, you are not allowed to make any bets/trades while you have private information (that is, while you are in a game, or if I haven’t yet reported the details of a game you were in to the public.)
https://manifold.markets/Zane_3219/will-chess-players-win-most-of-thei
Deception Chess: Game #1
Thanks, fixed.
I saw it fine at first, but after logging out I got the same error. Looks like you need a Chess.com account to see it.
Hi! I’m Zane! A couple of you might have encountered me from glowfic, although I’ve been reading LW since long before I started writing any glowfic. Aspiring rationalist, hoping to avoid getting killed by unaligned AGI, and so on. (Not that I want to be killed by anything else, either, of course.) I have a couple posts I wanted to make about various topics from LW, and I’m hoping to have some fun discussions here!
Also, the popup thingy that appears when a new user tries to make a comment has a bug—the links in it are directed to localhost:3001 instead of lesswrong.com (or was it 3000? the window went away, I can’t see it anymore.) Also, even after I replace the localhost:300[something] with lesswrong.com, one of the links still doesn’t work because it looks like the post it goes to is deleted.