TheManxLoiner comments on Visual demonstration of Optimizer’s curse

TheManxLoiner 13 Dec 2024 19:10 UTC
1 point
0
What do we mean by $U - V$ ?
I think the setting is:
- We have a true value function $V$
- We have a process to learn an estimate of $V$ . We run this process once and we get $U$
- We then ask an AI system to act so as to maximize $U$ (its estimate of human values)
So in this context, $U - V$ is just a fixed function measuring the error between the learnt values and true values.

I think confusion could be using the term $U$ to represent both a single instance or the random variable/process.
- Roman Malov 13 Dec 2024 22:03 UTC
  1 point
  0
  Parent
  So, $U (x)$ is a random variable in the sense that it is drawn from a distribution of functions, and the expected value of those functions at each point $x$ is equal to $V (x)$ . Am I understanding you correctly?
  - TheManxLoiner 14 Dec 2024 0:24 UTC
    3 points
    1
    Parent
    Sounds sensible to me!