I’m a little confused about “value of information” as a precise concept.
Suppose that you have a tree with two kinds of interior nodes in it, and leaves labeled with utilities. One interior node is a choice node, and the other is a nature node, with an associated distribution over its subtrees. It’s fairly obvious that you can work backwards up the tree, and find both an optimum strategy and an expected value of the entire tree.
However, I don’t see where “value of information” shows up in this framework anywhere. Do I need to distinguish some choice nodes as “information gathering”, and apply a default strategy to them as “don’t gather any information”, and then compute value of information as the difference between best I can do with my eyes closed and the best I can do flat out?
What if there is no natural partition of some actions as information gathering and not-information-gathering?
Is there some homomorphism from the first tree (which is normally a tree of evidence or belief states) to an “externally visible” tree, where two nodes are identified if the only difference is inside my head?
However, I don’t see where “value of information” shows up in this framework anywhere. Do I need to distinguish some choice nodes as “information gathering”, and apply a default strategy to them as “don’t gather any information”, and then compute value of information as the difference between best I can do with my eyes closed and the best I can do flat out?
Think of VoI as going in the reverse direction. That is, beforehand you would have modeled your test outcome as a nature node because you didn’t consider the option of not running the test. Now you stick in a choice node of “run the test” that leads to the nature node of the test output on the one branch, and the tree where you don’t know the test output on the other branch. Like you suggest, you then use the work-backwards algorithm to figure out the optimal decision at the “run the test” node, and the difference between the branch node values is the absolute value of the VoI minus the test cost.
What if there is no natural partition of some actions as information gathering and not-information-gathering?
Then VoI won’t help you very much. VoI is a concept that helps in specifying decision problems- building the tree- not computing a tree to find an optimal policy. It suggests modeling information-gathering activities as choice nodes leading to nature nodes, rather than just nature nodes. If you’ve got a complete decision problem already, then you don’t need VoI.
I should point out that most tests aren’t modeled as just information-gathering. If a test is costless, then why not run it, even if you throw the results away? Typically the test will have some cost, in either utility or prospects, and so in some sense there’s rarely actions that are purely information gathering.
Think of VoI as going in the reverse direction. That is, beforehand you would have modeled your test outcome as a nature node because you didn’t consider the option of not running the test. Now you stick in a choice node of “run the test” that leads to the nature node of the test output on the one branch, and the tree where you don’t know the test output on the other branch. Like you suggest, you then use the work-backwards algorithm to figure out the optimal decision at the “run the test” node, and the difference between the branch node values is the absolute value of the VoI minus the test cost.
The problem with this model is that it doesn’t necessarily give you the value of INFORMATION. Making the ‘get info’ node a choice point on the tree essentially allows arbitrary changes between the with info and without info branches of the tree. In other words it’s not clear we are finding the value of information and not some other result of this choice.
That is why I choose to phrase my model in terms of getting to look at otherwise hidden results of nature nodes.
Let T be a tree of height w and T’ a subset of T regarded as the set of choice nodes in T. Now given a function f on T’ assume there is a function P giving a probability measure P_f on the set of paths through T’ coherent with f for each f. Further assume there is a function u taking paths through T to utilities and define U_P(f) to be the integral of u with respect to P_f over all paths coherent with f and U(f|\sigma) to be the integral of u with respect to P_f over all paths extending \sigma coherent with f.
Now either assume that each choice node has finitely many successors and U(f|\sigma) is always finite. By compactness we can always achieve a upper bound. We now define the value of information of some set S of non-choice nodes in T (i.e. the benefit of getting to look at those nodes in making future decisions).
Let E(T) (the informed maximum) be the max of U(f) for f a function on T’ and let E(T/S) be the max of U(f) over those functions f satisfying f(\sigma)=f(\tau) if {x|\sigma(x) \neq \tau(x) } is a subset of S, i.e. those functions unaware of the result of nodes in S.
The value of getting to look at the results of nodes in S is thus E(T) - E(T/S). Note that to match the concept in the article S should be an initial segment of T (i.e. you get all that info immediately while our model includes the value of buying info that will only be revealed later).
So the original tree T models a tree of “states of affairs”,
and the original partition or subset T’ models each node being either under the decision-maker’s control or not.
Then the function f would go from elements of T’ to successors of those elements—nodes of T, to be sure, but there’s a side condition there.
Then the probability measure P is a somewhat more powerful way of attaching probabilities to the non-choice nodes—that is, if you have a distribution over the successors of each non-choice node, then you can obtain a probability measure, but a probability measure over paths would allow some additional correlations.
The function u can be understood (in a finite tree) as labelling leaves with utilities, because paths of maximum length in a finite tree are isomorphic with leaves—but by describing it the way you did, you leave the door open to applying this formalism to an infinite tree.
UP(f) would be the utility of a particular strategy (f), and U(f|\sigma) would be… the utility given a certain initial sequence of events? So \sigma is a finite path segment?
I don’t understand the grammar of “Now either …. and …”—should it be “Now either … or …”? Or is it really “Now assume … and …”?
When you use U(f) later, I am guessing that’s either UP(f) with the P elided, or U(f | the empty path segment) - regardless, we’re going to have to fix a P in order to get a utility, right?
Then the phrase “f(sigma)=f(tau) if the set of x such that \sigma(x) is not equal to tau(x) is a subset of S”—If I understand correctly, sigma and tau are finite path segments, which are isomorphic to nodes in the tree—but are they functions from nodes? If they are functions, wouldn’t they be from, say, initial sections of the integers 0...n TO nodes? If I understand correctly, they’re going to diverge at at most one point—once diverged, since it’s a tree, they’re not going to be able to rejoin. Or were you saying tree and thinking ‘dag’?
I worry about the types of these things; coding it up in something like Haskell or Ocaml might make everything sharper and perhaps suggest simplifications. I’m sure that you can carry through the basic intuition.
I’m a little confused about “value of information” as a precise concept.
Suppose that you have a tree with two kinds of interior nodes in it, and leaves labeled with utilities. One interior node is a choice node, and the other is a nature node, with an associated distribution over its subtrees. It’s fairly obvious that you can work backwards up the tree, and find both an optimum strategy and an expected value of the entire tree.
However, I don’t see where “value of information” shows up in this framework anywhere. Do I need to distinguish some choice nodes as “information gathering”, and apply a default strategy to them as “don’t gather any information”, and then compute value of information as the difference between best I can do with my eyes closed and the best I can do flat out?
What if there is no natural partition of some actions as information gathering and not-information-gathering?
Is there some homomorphism from the first tree (which is normally a tree of evidence or belief states) to an “externally visible” tree, where two nodes are identified if the only difference is inside my head?
Think of VoI as going in the reverse direction. That is, beforehand you would have modeled your test outcome as a nature node because you didn’t consider the option of not running the test. Now you stick in a choice node of “run the test” that leads to the nature node of the test output on the one branch, and the tree where you don’t know the test output on the other branch. Like you suggest, you then use the work-backwards algorithm to figure out the optimal decision at the “run the test” node, and the difference between the branch node values is the absolute value of the VoI minus the test cost.
Then VoI won’t help you very much. VoI is a concept that helps in specifying decision problems- building the tree- not computing a tree to find an optimal policy. It suggests modeling information-gathering activities as choice nodes leading to nature nodes, rather than just nature nodes. If you’ve got a complete decision problem already, then you don’t need VoI.
I should point out that most tests aren’t modeled as just information-gathering. If a test is costless, then why not run it, even if you throw the results away? Typically the test will have some cost, in either utility or prospects, and so in some sense there’s rarely actions that are purely information gathering.
The problem with this model is that it doesn’t necessarily give you the value of INFORMATION. Making the ‘get info’ node a choice point on the tree essentially allows arbitrary changes between the with info and without info branches of the tree. In other words it’s not clear we are finding the value of information and not some other result of this choice.
That is why I choose to phrase my model in terms of getting to look at otherwise hidden results of nature nodes.
Here is a nice formal model.
Let T be a tree of height w and T’ a subset of T regarded as the set of choice nodes in T. Now given a function f on T’ assume there is a function P giving a probability measure P_f on the set of paths through T’ coherent with f for each f. Further assume there is a function u taking paths through T to utilities and define U_P(f) to be the integral of u with respect to P_f over all paths coherent with f and U(f|\sigma) to be the integral of u with respect to P_f over all paths extending \sigma coherent with f. Now either assume that each choice node has finitely many successors and U(f|\sigma) is always finite. By compactness we can always achieve a upper bound. We now define the value of information of some set S of non-choice nodes in T (i.e. the benefit of getting to look at those nodes in making future decisions).
Let E(T) (the informed maximum) be the max of U(f) for f a function on T’ and let E(T/S) be the max of U(f) over those functions f satisfying f(\sigma)=f(\tau) if {x|\sigma(x) \neq \tau(x) } is a subset of S, i.e. those functions unaware of the result of nodes in S.
The value of getting to look at the results of nodes in S is thus E(T) - E(T/S). Note that to match the concept in the article S should be an initial segment of T (i.e. you get all that info immediately while our model includes the value of buying info that will only be revealed later).
Interesting..
So the original tree T models a tree of “states of affairs”, and the original partition or subset T’ models each node being either under the decision-maker’s control or not. Then the function f would go from elements of T’ to successors of those elements—nodes of T, to be sure, but there’s a side condition there. Then the probability measure P is a somewhat more powerful way of attaching probabilities to the non-choice nodes—that is, if you have a distribution over the successors of each non-choice node, then you can obtain a probability measure, but a probability measure over paths would allow some additional correlations. The function u can be understood (in a finite tree) as labelling leaves with utilities, because paths of maximum length in a finite tree are isomorphic with leaves—but by describing it the way you did, you leave the door open to applying this formalism to an infinite tree. UP(f) would be the utility of a particular strategy (f), and U(f|\sigma) would be… the utility given a certain initial sequence of events? So \sigma is a finite path segment?
I don’t understand the grammar of “Now either …. and …”—should it be “Now either … or …”? Or is it really “Now assume … and …”?
When you use U(f) later, I am guessing that’s either UP(f) with the P elided, or U(f | the empty path segment) - regardless, we’re going to have to fix a P in order to get a utility, right?
Then the phrase “f(sigma)=f(tau) if the set of x such that \sigma(x) is not equal to tau(x) is a subset of S”—If I understand correctly, sigma and tau are finite path segments, which are isomorphic to nodes in the tree—but are they functions from nodes? If they are functions, wouldn’t they be from, say, initial sections of the integers 0...n TO nodes? If I understand correctly, they’re going to diverge at at most one point—once diverged, since it’s a tree, they’re not going to be able to rejoin. Or were you saying tree and thinking ‘dag’?
I worry about the types of these things; coding it up in something like Haskell or Ocaml might make everything sharper and perhaps suggest simplifications. I’m sure that you can carry through the basic intuition.
Thanks for thinking about this.