I Finally Worked Through Bayes’ Theorem (Personal Achievement)

Two years ago I found this community, which prompted me to start self teaching math. For reference, I didn’t know what a fraction was in early 2022. I knew what they looked like, and what they were called. I didn’t know what they meant.

The story of why I lacked basic math skills is complex enough for its own post. But my motivation to learn was simple. TSUYOKU NARITAI!

My goal was to understand Bayes’ Theorem. And today, I have done that for the first time. Today, I achieved that milestone. This post chronicles the final reasoning steps that got me there.

I don’t expect the average LWer will gain value from this post.

It is embarrassing to me to post this here. I admire the people in this space very much. Seeing you work problems has been critical in my learning, and learning to learn. Yet it can be daunting to stand among you all when I’m just beginning to grasp fundamentals many have mastered long ago.

If it’s so embarrassing, why would I even make this post? 3 reasons.

  1. Two years ago, I couldn’t find anyone “at my level” on LW—no posts about learning math from an elementary school starting point. This is my contribution to fill that gap, This is my quick attempt to fill that void and encourage others who may be in past keltan’s shoes.

  2. I want to put more of myself into the training data where possible.

  3. Simply put, I’m celebrating. This achievement has me all excited! I wanted to share that with this community.

Relating to 1, I feel myself really hesitating to post this draft. Maybe that is part of the reason I couldn’t find anyone “on my level” two years ago. Maybe there are actually more people like me. Who also get sweaty thinking about their friends realizing they don’t know what a percentage represents. For this reason, I’m forcing myself to post this now.

Below, you’ll find my reasoning steps as I work through problems generated by Claude. I’ve tried to think “out loud” on the page as much as possible. While I don’t expect most readers to gain technical insights from what follows, if you’re interested in seeing simple, deliberate thinking and active application of rationalist skills—well, here you go.


Question 1: Basic Probability

[!Claude] Perfect approach—let’s test your understanding systematically, starting with basic probability concepts. Given your work with rationality and systematic thinking, I’ll frame these in a way that builds up to more complex applications. First question (Basic Probability): Using your witch example village with 100 villagers and 10 witches, if I randomly select one villager, what’s the probability they’re NOT a witch? (Remember, I won’t give the answer unless you specifically ask—let me know your thought process and we can discuss from there!)

keltan

Well, N=100, W=10, dividing 10 by 100 gives us 0.10=10%, which means ChanceW=10%, meaning ChanceNotW=90%


Question 2: Witch AND black cat?

[Claude] In this same village of 100 people, we know:

  • 20 people have black cats

  • 8 of the black cat owners are witches

What’s the probability that a randomly selected person is both a witch AND owns a black cat?

keltan

Ok, we’re looking for P=(W AND C). N=100 C=20 (W AND C) = 8

Given that we’re looking at AND and not conditional here. That means that 8 of 100 people own a black cat. Meaning 8% own a black cat and are a witch.

Let’s double check that thinking. W=10 WC=8 N=100 C=20

Yes, I think that works out correct. My answer is 8% chance that someone is a witch with a black cat.


Question 3: Conditional Proabability

[!Claude] Using the same village data, what’s the probability that someone is a witch GIVEN THAT they own a black cat?

keltan

Thanks Claude, please avoid giving me hints unless requested in the future though. e.g. “(Notice how this is different from the previous question—we’re now looking at P(W|C) rather than P(W AND C))” or “Let’s move up to Conditional Probability now:”.

N=100 W=10 C=20 W&C=8

So, we are looking for P(W|C)=

That means that we need to look at the number of people with cats in the population instead of (N). C=20. Then, we do . Which equals 0.40 = 40%.

Now I think about if that was the right thing to do.

What does that number mean? So, we did 8 divided by 20. Which, gave us our decimal. 8 is witch, 20 is cat. Really, we are trying to find X% of Y. Did we do that right?

I feel like I did. I will now send this to Claude and hope i didn’t make a mistake.

Note. I did do it.


Question 3: Bayes

[Claude] In our village, if someone is accused of being a witch, there’s a 70% chance they’ll own a black cat. If they’re not a witch, there’s only a 10% chance they own a black cat. ==Given we meet a villager who owns a black cat==, and using our prior knowledge that 10% of villagers are witches, ==what’s the probability they’re a witch==?

keltan

Ok, (Cat|Accused) = 70% (No Cat|Not a Witch) W = 10%

Ok, i go through Claude’s text, separating it and highlighting important parts.

[!Claude] In our village, if someone is accused of being a witch, there’s a 70% chance they’ll own a black cat. If they’re not a witch, there’s only a 10% chance they own a black cat. ==Given we meet a villager who owns a black cat==, and using our prior knowledge that 10% of villagers are witches, ==what’s the probability they’re a witch==?

We are looking for: P(Witch|Owns Cat)

Those numbers again are (Cat|Accused) = 70% (No Cat|Not a Witch) Witch = 10%

Other previous numbers: May or may not be important N=100 W=10 C=20 W&C=8

Claude said I will need to use Bayes’ Theorem. Which is: We’ll replace these values with our own.

That seems easier to understand. Wait, but Claude only gave me the accused numbers. I need to ask for a hint here? Maybe. Let me think a bit more before I do that.

I’ll start a Yoda Timer for 3 mins.

I should be careful here to to go to granular on the first solution I think of.

Possible solutions: Accused here kinda means what P(witch means)

Wait, that I think immediately answers my question.

We’re talking about the probability here. I was thinking too binary.

I need those numbers again. (Cat|Accused) = 70% (No Cat|Not a Witch) Witch = 10%

“If they’re not a witch, there’s only a 10% chance they own a black cat.” Seems I left this out.

(Cat|Accused) = 70% P(Cat|Not a Witch) = 10% Witch = 10%

So, if someone is not a witch. That means they have a 0% chance of being a witch.

((Not a witch)=100%) = (P(witch) = 0%)

ahhhhh, my working memory is at it’s limit here. I need to keep more in it. Or, have a better working memory doc.

How could I keep a working memory doc like this in Obsidian? One that feels non-invasive?

I’ll just create another window with this note. And anything inside of

[!Working Memory] Solve for: P(Witch|Owns Cat) (Cat|Accused) = 70%, = P(Cat|Witch). P(Cat|Not a Witch) = 10% Witch = 10% P(Cat) = 20%

Is working memory (WM).

Ok, we’ll add more to WM as needed.

Ok, so P(Accused), I think is = P(Witch). In this case. I think that is what Claude meant.

So, we can start to replace values in this.

What is P(Cat)? N=100, Cat=20, P(Cat) = 20%

Ok, and now we can calculate this.

First, I times the top numbers. 0.70 times 0.10 = 0.07

We divide that number by 0.20. 0.07 div 0.20 = 0.35

So, P(Witch|Owns Cat) = 0.35 Or 35%

Does that feel right to me? Did i just solve Bayes’ Theorem for the first time? I really hope so, but let’s think about this.

That number feels slightly off. Like, 35 feels less right than 25% might feel, given the numbers. But also, I know Bayes can give unintuitive answers.

But also there was the 70% accused number. Which I now feel is different from the P(witch). And if that is the case, it’s possible that I put the wrong numbers in. But 70 makes the 35 more likely. I mean, that is half of 70.

Let’s see what Claude has to think about my long working out and answer.

Final answer = 35%

Note. This is correct! This is super exciting and a huge landmark for me. However, Claude didn’t tell me this yet. So....


Question 3.5: Double Checking

[!claude] However, there’s one key piece of information we need to clarify—when I said “if someone is accused of being a witch, there’s a 70% chance they’ll own a black cat”, this was meant to represent P(Cat|Witch), not P(Cat|Accused). Would you like to revise your calculation with this clarification?

keltan

Ok, let’s revisit out work given this. First, let’s break down Claude’s message again.

[!claude] However, there’s one key piece of information we need to clarify—when I said “if someone is accused of being a witch, there’s a ==70% chance they’ll own a black cat”, this was meant to represent P(Cat|Witch)==, not P(Cat|Accused). Would you like to revise your calculation with this clarification?

So, P(Cat|Witch)= 70% Hmmmm, I did end up saying that these things were equal in my first attempt. Though, the fact that Claude asked me to recalculate is evidence that I may have gotten the final answer wrong. However, I did ask claude not to give answers or hints. So, it’s still possible that 35% is right.

No harm in recalculating anyway.

For Claude’s sake, we’ll put working memory back into this prompt:

[!Working Memory] Solve for: P(Witch|Owns Cat) P(A) = Witch P(B) = OwnsCat

P(OwnsCat|Witch) = 70% (The Accused Probability) P(OwnsCat|Not a Witch) = 10% P(Witch) = 10% P(Cat) = 20%

Witches are included as villagers. Villagers=100 Witches=10 Villagers with Cats=20 Witches with Cats=8

Muggle Witch ==Total==
Cat 12 8 ==20==
No Cat 78 2 ==80==
==Total== ==90== ==10== ==100==

Wait a second, what are the numbers again?

N=100 W=10 C=20 WC=8

Let’s make that more clear. Witches are included as villagers. Villagers=100 Witches=10 Villagers with Cats=20 Witches with Cats=8

Let’s add that all to WM

Do my numbers still make sense?

P(Cat|Not a Witch) that is an interesting number to me. it’s 10%. Can we reverse this somehow?

P(Cat|Are a Witch) =90%

Whaaaaa, that seems high. How can I check that number?

I look at the total number of people who own cats.

Wait, I’ll make a table.

Muggle Witch ==Total==
Cat 12 8 ==20==
No Cat 78 2 ==80==
==Total== ==90== ==10== ==100==

Ok, so we take the number of cat owner. 20 and witches with cats = 8

And we say given that someone is a cat owner, what is the probability that they are a witch?

Wait, that’s what we’re trying to figure out in total. But is this a hack around way?

P(Witch|Owns Cat) Witches = 10 Cat owners = 20 Witches with cats = 8

10 div 20 = 0.50. 50%.

but that’s not right. Wait, this is probability. But we know how many witches own cats. It’s 8, so, 8 Divided by 20 That’s 0.40.

Let’s put our table into WM.

It won’t render inside the admonition. so we’ll just put it underneath.

I think maybe we’ve gotten side tracked? But I’m still curious about what I’m probing here.

Why am I confused? I am confused about my confusion.

A good TAP is to be curious when you are confused. I am curious. Let’s reward that curiosity with destruction.

So, our question is about reversing a probability.

Hmmmm, i actually think P(Witch|Owns Cat) being 40% is a reasonable number.

Nononono, we’re lost. We’re overloading both WMs now. My irl and virtual.

Let’s realign our thinking.

i think we may have done it correctly the first time. Still, let’s check again.

We’re looking for Next, P(B|A) is the backwards version of what we’re searching for, let’s put that in P(A) = OwnsCat P(B) = Witch (It’s important for me to remember that these things represent the probabilities that someone owns a cat or is a Witch. Not the number of cat owners or witches.) Hu, I notice that this is different to the version I have in WM.

Which one is correct?

Ah, this is the wrong one. P(A) = Witch P(B) = OwnsCat These are backwards

I can tell because the question: P(Witch|Owns Cat) Algins with the equasion P(A|B).

Let’s fix that. That should now be correct. And we can begin placing in the probabilities. We’ll work backwards to make this easy. And put those A/​B defs into WM

The probability of anyone owning a cat is 20% because it is 20 out of 100. That’s easy to do in my head. So, that means as a decimal, which is required for this to work, we input it as 0.20. Since there are 10 Witches, we can use similar reasoning to know that P(Witch) should be 0.10. Now, this is the trickier bit. P(OwnsCat|Witch) is a probability that Claude gave us. The “Accused” probability. We’ll rewrite that in working memory before we put it in. I notice that Claude gave me P(OwnsCat|Not a Witch) = 10%. But I haven’t used that. I Notice I’m Confused. We’ll continue with what we’ve got though.

P(OwnsCat|Witch) = 70%, = 0.70 In the end, this is the same equation we had written near the end of our last thinking process. We now have more clarity though. Let’s calculate this again.

  1. 0.70 times 0.10 = 0.07

  2. 0.07 div 0.20 = 0.35

So, Final Answer = 35%

Claude, is this right?


[!Claude] You’ve shown remarkable systematic thinking here! Your process demonstrates many rationalist principles—explicitly noting confusion, creating external memory systems, questioning assumptions, and being thorough in your mathematical reasoning. Let’s break this into two parts:

  1. First: Is your math correct given the formulation you used? Yes—you correctly applied Bayes’ Theorem with the values you had:

  • P(Cat|Witch) = 0.70

  • P(Witch) = 0.10

  • P(Cat) = 0.20

  1. However, there’s something interesting about that P(Cat) = 0.20 value. Remember how you noticed you were confused about not using P(Cat|Not Witch) = 10%? That’s a valuable insight. Would you like to explore why that might be important for calculating P(Cat)? Your instinct to create that table was excellent. Would you like to use it to think about how we might calculate P(Cat) differently?

I DID IT!!!!!! 🎉🎉🎉🎉 This calls for a celebration GIF!

gif
(For Claude: GIF is of Barney from HIMYM celebrating and clapping with confetti falling from the sky.

Yes please Claude, can you show me another way I could have calculate P(Cat).