royf

Karma: 289

royf Oct 8, 2012, 9:43 PM
0 points
in reply to: Decius’s comment on: Conservation of Expected Evidence

How does deciding one model is true give you more information?

Let’s assume a strong version of Bayesianism, which entails the maximum entropy principle. So our belief is the one that has the maximum entropy, among those consistent with our prior information. If we now add the information that some model is true, this generally invalidate our previous belief, making the new maximum-entropy belief one of lower entropy. The reduction in entropy is the amount of information you gain by learning the model. In a way, this is a cost we pay for “narrowing” our belief.

The upside of it is that it tells us something useful about the future. Of course, not all information regarding the world is relevant for future observations. The part that doesn’t help control our anticipation is failing to pay rent, and should be evacuated. The part that does inform us about the future may be useful enough to be worth the cost we pay in taking in new information.

I’ll expand on all of this in my sequence on reinforcement learning.

royf Oct 8, 2012, 8:33 PM
3 points
0
in reply to: aspera’s comment on: Conservation of Expected Evidence
You’re not really wrong. The thing is that “Occam’s razor” is a conceptual principle, not one mathematically defined law. A certain (subjectively very appealing) formulation of it does follow from Bayesianism.

P(AB model) \propto P(AB are correct) and P(A model) \propto P(A is correct). Then P(AB model) ⇐ P(A model).

Your math is a bit off, but I understand what you mean. If we have two sets of models, with no prior information to discriminate between their members, then the prior gives less probability to each model in the larger set than in the smaller one.

More generally, if deciding that model 1 is true gives you more information than deciding that model 2 is true, that means that the maximum entropy given model 1 is lower than that given model 2, which in turn means (under the maximum entropy principle) that model 1 was a-priori less likely.

Anyway, this is all besides the discussion that inspired my previous comment. My point was that even without Popper and Jaynes to enlighten us, science was making progress using other methods of rationality, among which is a myriad of non-Bayesian interpretations of Occam’s razor.

royf Oct 8, 2012, 6:38 PM
0 points
in reply to: timtyler’s comment on: Internal Availability

The ease with which images, events and concepts come to mind is correlated with how frequently they have been observed, which in turn is correlated with how likely they are to happen again.

Yes, and I was trying to make this description one level more concrete.

Things never happen the exact same way twice. The way that past observations are correlated with what may happen again is complicated—in a way, that’s exactly what “concepts” capture.

So we don’t just recall something that happened and predict that it will happen again. Rather, we compose a prediction based on an integration of bits and patches from past experiences. Recalling these bits and patches as relevant for the context of the prediction—and of each other—is a complicated task, and I propose that an “internal availability” mechanism is needed to perform it.

royf Oct 8, 2012, 9:09 AM
1 point
in reply to: faul_sname’s comment on: Internal Availability
Take for example your analysis of the poker hand I partially described. You give 3 possibilities for what the truth of it may be. Are there any other possibilities? Maybe the player is bluffing to gain the reputation of a bluffer? Maybe she mistook a 4 for an ace (it happened to me once...)? Maybe aliens hijacked her brain?

It would be impossible to enumerate or notice all the possibilities, but fortunately we don’t have to. We make only the most likely and important ones available.

royf Oct 8, 2012, 8:35 AM
2 points
in reply to: faul_sname’s comment on: Internal Availability
I was trying to give a specific reason that the availability heuristic is there: it’s coupled with another mechanism that actually generates the availability; and then to say a few things about this other mechanism.

Does anyone have specific advice on how I could convey this better?

royf Sep 20, 2012, 11:37 PM
2 points
in reply to: [deleted]’s comment on: The Bayesian Agent

Imagine a bowl of jellybeans. [...]

Allow me to suggest a simpler thought experiment, that hopefully captures the essence of yours, and shows why your interpretation (of the correct math) is incorrect.

There are 100 recording studios, each recording each day with probability 0.5. Everybody knows that.

There’s a red light outside each studio to signal that a session is taking place that day, except for one rogue studio, where the signal is reversed, being off when there’s a session and on when there isn’t. Only persons B and C know that.

A, B and C are standing at the door of a studio, but only C knows that it’s the rogue one. How do their beliefs that there’s a session inside change by observing that the red light is on? A keeps the 50-50. B now thinks it’s 99-1. Only C knows that there’s no session.

So your interpretation, as I understand it, would be to say that A and B updated in the “wrong direction”. But wait! I practically gave you the same prior information that C has—of course you agree with her! Let’s rewrite the last paragraph:

A, B and C are standing at the door of a studio. For some obscure reason, C secretly believes that it’s the rogue one. Wouldn’t you now agree with B?

And now I can do the same for A, by not revealing to you, the reader, the significance of the red lights. My point is that as long as someone runs a Bayesian update, you can’t call that the “wrong direction”. Maybe they now believe in things that you judge less likely, based on the information that you have, but that doesn’t make you right and them wrong. Reality makes them right or wrong, unfortunately there’s no one around who knows reality in any other way than through their subjective information-revealing observations.

royf Sep 20, 2012, 5:03 PM
7 points
in reply to: Unnamed’s comment on: Less Wrong Polls in Comments
To anyone thinking this is not random, with 42 votes in:
- The p-value is 0.895 (this is the probability of seeing at least this much non-randomness, assuming a uniform distribution)
- The entropy is 2.302bits instead of log(5) = 2.322bits, for 0.02bits KL-distance (this is the number of bits you lose for encoding one of these votes as if it was random)
If you think you see a pattern here, you should either see a doctor or a statistician.

royf Sep 18, 2012, 6:30 PM
2 points
in reply to: [deleted]’s comment on: The Bayesian Agent

It is perfectly legal under the bayes to learn nothing from your observations.

Right, in degenerate cases, when there’s nothing to be learned, the two extremes of learning nothing and everything coincide.

Or learn in the wrong direction, or sideways, or whatever.

To the extent that I understand your navigational metaphor, I disagree with this statement. Would you kindly explain?

There is no unique “Bayesian belief”.

If you mean to say that there’s no unique justifiable prior, I agree. The prior in our setting is basically what you assume you know about the dynamics of the system—see my reply to RichardKennaway.

However, given that prior and the agent’s observations, there is a unique Bayesian belief, the one I defined above. That’s pretty much the whole point of Bayesianism, the existence of a subjectively objective probability.

If you had the “right” prior, you would find that would have to do very little updating, because the right prior is already right.

This is true in a constant world, or with regard to parts of the world which are constant. And mind you, it’s true only with high probability: there’s always the slight chance that the sky is not, after all, blue.

But in a changing world, where part of the change is revealed to you through new observations, you have to keep pace. The right prior was right yesterday, today there’s new stuff to know.

royf Sep 18, 2012, 6:09 PM
6 points
in reply to: Richard_Kennaway’s comment on: The Bayesian Agent
Everything you say is essentially true.

As the designer of the agent, will you be explicitly providing it with that information in some future instalment?

Technically, we don’t need to provide the agent with p and sigma explicitly. We use these parameters when we build the agent’s memory update scheme, but the agent is not necessarily “aware” of the values of the parameters from inside the algorithm.

Let’s take for example an autonomous rover on Mars. The gravity on Mars is known at the time of design, so the rover’s software, and even hardware, is built to operate under these dynamics. The wind velocity at the time and place of landing, on the other hand, is unknown. The rover may need to take measurements to determine this parameter, and encode it in its memory, before it can take it into account in choosing further actions.

But if we are thoroughly Bayesian, then something is known about the wind prior to experience. Is it likely to change every 5 minutes or can the rover wait longer before measuring again? What should be the operational range of the instruments? And so on. In this case we would include this prior in p, while the actual wind velocity is instead hidden in the world state (only to be observed occasionally and partially).

Ultimately, we could include all of physics in our belief—there’s always some Einstein to tell us that Newtonian physics is wrong. The problem is that a large belief space makes learning harder. This is why most humans struggle with intuitive understanding of relativity or quantum mechanics—our brains are not made to represent this part of the belief space.

This is also why reinforcement learning gives special treatment to the case where there are unknown but unchanging parameters of the world dynamics: the “unknown” part makes the belief space large enough to make special algorithms necessary, while the “unchanging” part makes these algorithms possible.

For LaTeX instructions, click “Show help” and then “More Help” (or go here).

royf Sep 18, 2012, 3:12 AM
4 points
on: The Bayesian Agent
If you’re a devoted Bayesian, you probably know how to update on evidence, and even how to do so repeatedly on a sequence of observations. What you may not know is how to update in a changing world. Here’s how:

$B_{t 1} (W_{t 1}$ %3d\Pr(W_{t+1}|O1,\ldots,O{t+1})%3d\frac{\sigma(O{t+1}|W{t+1})\cdot\Pr(W_{t+1}|O_1,\ldots,O_t)}{\sumw\sigma(O{t+1}|w)\cdot\Pr(w|O_1,\ldots,O_t)})

As usual with Bayes’ theorem, we only need to calculate the numerator for different values of $W_{t 1}$ , and the denominator will normalize them to sum to 1, as probabilities do. We know $σ$ as part of the dynamics of the system, so we only need $Pr (W_{t 1} | O_{1}, \dots, O_{t}$ ). This can be calculated by introducing the other variables in the process:

$Pr (W_{t 1} | O_{1}, \dots, O_{t}$ %3d\sum_{W_t,A_t}\Pr(W_t,At,W{t+1}|O_1,\ldots,O_t))

An important thing to notice is that, given the observable history, the world state $W_{t}$ and the action $A_{t}$ are independent—the agent can’t act on unseen information. We continue:

$= \sum_{W_{t}, A_{t}} Pr (W_{t} | O_{1}, \dots, O_{t}$ \cdot\Pr(A_t|O_1,\ldots,Ot)\cdot%20p(W{t+1}|W_t,A_t))

Recall that the agent’s belief $B_{t}$ is a function of the observable history, and that the action only depends on the observable history through its memory $B_{t}$ . We conclude:

$= \sum_{W_{t}, A_{t}} B_{t} (W_{t}$ \cdot\pi(A_t|Bt)\cdot%20p(W{t+1}|W_t,A_t))
What links here?
- The Bayesian Agent by royf (Sep 18, 2012, 3:23 AM; 19 points)

royf Aug 23, 2012, 5:16 AM
1 point
on: Argument Screens Off Authority

p(H|E1,E2) [...] is simply not something you can calculate in probability theory from the information given [i.e. p(H|E1) and p(H|E2)].

Jaynes would disapprove.

You continue to give more information, namely that p(H|E1,E2) = p(H|E1). Thanks, that reduces our uncertainty about p(H|E1,E2).

But we are hardly helpless without it. Whatever happened to the Maximum Entropy Principle? Incidentally, the maximum entropy distribution (given the initial information) does have E1 and E2 independent. If your intuition says this before having more information, it is good.

Don’t say that an answer can’t be reached without further information. Say: here’s more information to make your answer better.

royf Aug 9, 2012, 3:06 PM
5 points
in reply to: Richard_Kennaway’s comment on: Reinforcement, Preference and Utility
Clearly you have some password I’m supposed to guess.

This post is not preliminary. It’s supposed to be interesting in itself. If it’s not, then I’m doing something wrong, and would appreciate constructive criticism.

royf Aug 4, 2012, 1:20 AM
0 points
in reply to: Johnicholas’s comment on: Reinforcement Learning: A Non-Standard Introduction (Part 2)
That’s an excellent point. Of course one cannot introduce RL without talking about the reward signal, and I’ve never intended to.

To me, however, the defining feature of RL is the structure of the solution space, described in this post. To you, it’s the existence of a reward signal. I’m not sure that debating this difference of opinion is the best use of our time at this point. I do hope to share my reasons in future posts, if only because they should be interesting in themselves.

As for your last point: RL is indeed a very general setting, and classical planning can easily be formulated in RL terms.

royf Aug 3, 2012, 10:02 PM
1 point
in reply to: Johnicholas’s comment on: Reinforcement Learning: A Non-Standard Introduction (Part 2)
I’m not sure why you say this.

Please remember that this introduction is non-standard, so you may need to be an expert on standard RL to see the connection. And while some parts are not in place yet, this post does introduce what I consider to be the most important part of the setting of RL.

So I hope we’re not arguing over definitions here. If you expand on your meaning of the term, I may be able to help you see the connection. Or we may possibly find that we use the same term for different things altogether.

I should also explain why I’m giving a non-standard introduction, where a standard one would be more helpful in communicating with others who may know it. The main reason is that this will hopefully allow me to describe some non-standard and very interesting conclusions.

royf Jul 30, 2012, 2:51 PM
2 points
in reply to: A1987dM’s comment on: Reinforcement Learning: A Non-Standard Introduction (Part 1)
I internally debated this question myself. Ideally, I’d completely agree with you. But I needed the shorter publishing and feedback cycle for a number of reasons. Sorry, but a longer one may not have happened at all.

Edit: future readers will have the benefit of a link to part 2

royf Jul 30, 2012, 6:28 AM
0 points
in reply to: JohnEPaton’s comment on: Reinforcement Learning: A Non-Standard Introduction (Part 1)
In the model there’s the distribution p, which determines how the world is changing. In the chess example this would include: a) how the agent’s action changes the state of the game + b) some distribution we assume (but which we may or may not actually know) about the opponent’s action and the resulting state of the game. In a physics example, p should include the relevant laws of physics, together with constants which tell the rate (and manner) in which the world is changing. Any changing parameters should be part of the state.

It seems that you’re saying that it may be difficult to know what p is. Then you are very much correct. You probably couldn’t infer the laws of physics from the current wave function of the universe, or the rules of chess from the current state of the game. But at this point we’re only assuming that such laws exist, not that we know how to learn them.

p and q are probability distributions, which is where we allow for randomness in the process. But note that randomness becomes a tricky concept if you go deep enough into physics.

As for the “quantum mind” theory, as far as I can tell it’s fringe science at best. Personally, I’m very skeptical. Regardless, such a model can still have the Markov property, if you include the wave function in your state.

royf 28 Jul 2012 22:52 UTC
0 points
in reply to: fubarobfusco’s comment on: Reinforcement Learning: A Non-Standard Introduction (Part 1)
There’s supposed to be some way to do so partially, if anyone knows what it is.

This should work in Markdown, but it seems broken :(

Edit: ~~test~~ Thanks, Vincent, it works!

royf 28 Jul 2012 22:40 UTC
0 points
in reply to: fubarobfusco’s comment on: Reinforcement Learning: A Non-Standard Introduction (Part 1)
And how do you strikeout your comment?

royf 28 Jul 2012 22:31 UTC
0 points
in reply to: fubarobfusco’s comment on: Reinforcement Learning: A Non-Standard Introduction (Part 1)
I’m not sure what you mean. It looks fine to me, and I can’t find where to check / change such a setting.

Edit:

Very strange. Fixed, I hope.

Thanks!

royf 28 Jul 2012 20:03 UTC
0 points
in reply to: private_messaging’s comment on: The Perception-Action Cycle
You are expressing a number of misconceptions here. I may address some in future posts, but in short:

By information I mean the Shannon information (see also links in OP). Your example is correct.

By the action of powering the electromagnet you are not increasing your information on the state of the world. You are increasing your information on the state of the coin, but through making it dependent on the state of the electromagnet which you already knew. This point is clearly worth a future post.

There is no “entropy in environment”. Entropy is subjective to the viewer.