Zane

Karma: 465

Zane 28 May 2023 1:32 UTC
6 points
0
on: Open & Welcome Thread—May 2023
Hi! I’m Zane! A couple of you might have encountered me from glowfic, although I’ve been reading LW since long before I started writing any glowfic. Aspiring rationalist, hoping to avoid getting killed by unaligned AGI, and so on. (Not that I want to be killed by anything else, either, of course.) I have a couple posts I wanted to make about various topics from LW, and I’m hoping to have some fun discussions here!
Also, the popup thingy that appears when a new user tries to make a comment has a bug—the links in it are directed to localhost:3001 instead of lesswrong.com (or was it 3000? the window went away, I can’t see it anymore.) Also, even after I replace the localhost:300[something] with lesswrong.com, one of the links still doesn’t work because it looks like the post it goes to is deleted.

Zane 17 Jun 2023 0:58 UTC
2 points
0
on: How not to write the Cookbook of Doom?
No knowledge of prior art, but what do you mean by negative things that give people ideas? I was under impression that most of the examples people talk about involved things that were pretty limited to superhuman capabilities for the time being—self-replicating nanotech and so on. Or are you asking about something other than extinction risk, like chatbots manipulating people or something along those lines? Could you clarify?

Zane 10 Jul 2023 19:44 UTC
1 point
on: Challenges to Yudkowsky’s Pronoun Reform Proposal
Huh, I just started rereading a few of your posts yesterday, including this one… and by what I assume is complete coincidence, I met someone today named Oliver, who looked nothing like an Oliver. This one didn’t look like a Bill, though; I think he was more of a Nate.

Zane 11 Jul 2023 17:45 UTC
3 points
0
on: Introducing bayescalc.io
This is cool! You might want to reposition the “How to use” message a little; it’s currently covering up the button that lets you add more hypotheses, so it took me a while to find it.

Zane 12 Jul 2023 22:33 UTC
4 points
−2
on: Betting on Logic
I would think that FDT chooses Bet 2, unless I’m misunderstanding something about the role of Peano Arithmetic here. Taking Bet 2 results in P being true, and vice versa for Bet 1; therefore, the only options that are actually possible are the bottom left and the top right.
In fact, this seems like the exact sort of situation in which FDT can be easily shown to outperform CDT. CDT would reason along the lines of “Bet 1 is better if P is true, and better if P is false, and therefore better overall” without paying attention to the direct dependency between the output of your decision algorithm and the truth value of P.
I’m not quite sure what Yudkowsky and Soares meant by “dominance” there. I’d guess on priors that they meant FDT pays attention to those dependencies when deciding whether one strategy outperforms another… but yeah, they kind of worded it in a way that suggests the opposite interpretation.

Zane 13 Jul 2023 16:10 UTC
3 points
1
in reply to: Sylvester Kollin’s comment on: Betting on Logic
But wouldn’t what Peano is capable of proving about your specific algorithm necessarily be “downstream” of the output of that algorithm itself? The Peano axioms are upstream, yes, but what Peano proves about a particular function depends on what that function is.

Zane 4 Aug 2023 2:24 UTC
1 point
in reply to: Eli Tyre’s comment on: Final Words
I voted up on every comment in this chain on which someone stated that they voted it up, and down on every comment on this chain on which someone stated that they voted it down, removing votes when they cancelled out and using strong-votes instead when they added together. I regret to say that the comment by Dorikka seems to have had three more people say that they voted it up than that they voted it down, so although I gave it a strong upvote, I have only been able to replicate two-thirds of the original vote. I upvoted Dorikka’s last comment on another post to bring the universe back into balance.

Zane 2 Oct 2023 19:55 UTC
1 point
0
on: Fifty Flips
I got alternating THTHTHTHTH… for the first 28 flips, which I would have thought would be very unlikely on priors for the 80% rule. Are you sure that’s an accurate description of the rule? It doesn’t change halfway through?

Eliezer’s example on Bayesian statistics is wr… oops!

Zane17 Oct 2023 18:38 UTC

70 points

13 comments7 min readLW link

Zane 18 Oct 2023 0:09 UTC
1 point
0
in reply to: Radford Neal’s comment on: Eliezer’s example on Bayesian statistics is wr… oops!
Yeah, I discovered that part on accident at one point because I used the binomial distribution equation in a situation where it didn’t really apply, but still got the right answer.
I would think the most natural way to write a likelihood function would be to divide by the integral from 0 to 1, so that the total area under the curve is 1. That way the integral from a to b gives the probability the hypothesis assigns to receiving a result between a and b. But all that really matters is the ratios, which stay the same even without that.

[Question] What is an “anti-Occamian prior”?

Zane23 Oct 2023 2:26 UTC

35 points

22 comments1 min readLW link

Zane 23 Oct 2023 15:22 UTC
2 points
0
on: VLM-RM: Specifying Rewards with Natural Language
That’s terrifyingly cool! I notice that they usually fall over after having completed the assigned position; are you only rewarding them being in a position at a particular point in time, after which there’s nothing left to optimize for? Are you able to make them maintain a position for longer?

[Question] Lying to chess players for alignment

Zane25 Oct 2023 17:47 UTC

96 points

54 comments1 min readLW link

Zane 25 Oct 2023 18:54 UTC
1 point
0
in reply to: javva209’s comment on: Lying to chess players for alignment
Unsure about the time controls at the moment; see my response to aphyer. The advisors would be able to give the A player justification for the move they’ve recommended.
The concern that A might not be able to understand the reasoning that the advisors give them is a valid one, and that’s the whole point of the experiment! If A can’t follow the reasoning well enough to determine whether it’s good advice, then (says the analogy) people who are asking AIs how to solve alignment can’t follow their reasoning well enough to determine whether it’s good advice.

Zane 26 Oct 2023 13:52 UTC
1 point
0
in reply to: Richard Willis’s comment on: Lying to chess players for alignment
Individual positions like that could be an interesting thing to test; I’ll likely have some people try out some of those too.
I think the aspect where the deceivers have to tell the truth in many cases to avoid getting caught could make it more realistic, as in the real AI situation the best strategy might be to present a mostly coherent plan with a few fatal flaws.

Zane 26 Oct 2023 14:26 UTC
1 point
0
in reply to: Sune’s comment on: Lying to chess players for alignment
Agreed that it could be a bit more realistic that way, but the main constraint here is that we need a game where there are three distinct levels of players who always beat each other. The element of luck in games like poker and backgammon makes that harder to guarantee (as suggested by the stats Joern_Stoller brought up). And another issue is that it’ll be harder to find a lot of skilled players at different levels from any game that isn’t as popular as chess is—even if we find an obscure game that would in theory be a better fit for the experiment, we won’t be able to find any Cs for it.

Zane 28 Oct 2023 2:07 UTC
1 point
0
on: Lying to chess players for alignment
I’ve created a Manifold market if anyone wants to bet on what happens. If you’re playing in the experiment, you are not allowed to make any bets/trades while you have private information (that is, while you are in a game, or if I haven’t yet reported the details of a game you were in to the public.)
https://manifold.markets/Zane_3219/will-chess-players-win-most-of-thei

Deception Chess: Game #1

Zane, aphyer, Alex A and AdamYedidia

3 Nov 2023 21:13 UTC

104 points

19 comments8 min readLW link

Zane 6 Nov 2023 14:38 UTC
1 point
0
in reply to: Simon Fischer’s comment on: Deception Chess: Game #1
Thanks, fixed.

Zane 7 Nov 2023 16:30 UTC
2 points
1
in reply to: PoignardAzur’s comment on: Deception Chess: Game #1
I saw it fine at first, but after logging out I got the same error. Looks like you need a Chess.com account to see it.

Zane

Eliezer’s ex­am­ple on Bayesian statis­tics is wr… oops!

[Question] What is an “anti-Oc­camian prior”?

[Question] Ly­ing to chess play­ers for alignment

De­cep­tion Chess: Game #1

Eliezer’s example on Bayesian statistics is wr… oops!

[Question] What is an “anti-Occamian prior”?

[Question] Lying to chess players for alignment

Deception Chess: Game #1