Maybe, I should have expanded on what I mean by a hint. I think I wasn’t clear. It is not the raw context, it is the ability to be spoilered by words. So a hint is non-visual information about the picture, which allows you to locate the object in question.
Does that make things clearer?
In general, machine learning is entirely focused on measures.
But what does it stop them becoming targets, per Goodhearts?
But what does it stop them becoming targets, per Goodhearts?
This is unavoidable. If you optimize something, you’re optimizing the thing. Your post suggesting a separation is an attempt at describing something real, but you misunderstood why there’s a separation. You only have one true value function in your low level circuity in your head, and everything else is proxies; so, you have to check things against your value function. By saying “give your subordinates goals”, you are giving them proxies to optimize, which will be goodharted if your subordinates are competent enough. You need to have a white box view of what’s happening to explore and improve the goals you’re giving, until you can find ones that accurately represent your internal values.
In machine learning, this just happens by running a model, trying it on a test set, and giving it a new training objective if it does badly on the thing you actually care about. The training objective is just a proxy, and you test it against a test set to ensure that if it is over-optimizing your proxy—also known as overfitting—then you’ll notice the discrepancy. In value function learning, you’d still do this; you’d have some out-of-sample ethical problems in your test set, and you’d see how your AI does on them. This would be one of the ingredients in making a safe AI based on ML. See “Concrete problems in ai safety”, Amodei 2016
It seems to me that your post can be summarized as “one needs continuous metrics to optimize for, not boolean tests”. Does that seem wrong in any way?
That is not what I was trying to say at all.… Lets try math’s notation. You have
Have a model M which is a function of the state of the world s and the target and hit, you take at time t, a1. The model gives a predicted state of the world at time 2, ps2.
So ps2 = M(s1, a1)
Lets say you have a measure U() over states of the world. Which gives a utility for the state.
U(s1) = u1
You build up a model of what U is Mu so that you don’t have to try to hit every state to find the U of that state. So you have
Mu(s1) = pu1
A normal optimisation process just adjusts a1 until pu2 is maximised. So
f(a1) = argmax Mu(M(s1,a1))
What I’m arguing for is is that you are not getting high U on an poorly defined problem you want to be spending a lot of time adjusting M not adjusting a if you get a poor u.
If your model M is inaccurate that is ps2 != s2 you can improve your pu2 by updating your model to minimise the sqrt of error in the prediction at time t with the action that happened at t and the actual state at t+1st+1
g(st) = argmin on M (sqrt(M(st,at)-st+1))
So in machine learnings case, your actions a are algorithms or systems you are testing and M is your model of what an intelligence is. If the turing test is giving your system a bad u, and you can’t easily predict the real world intelligences (e.g. humans) with your model of intelligence that is M(st,at)-st+1 is large, update your model of what an intelligence should be doing so that you can better predict what an intelligence should be. This will give you enable you to pick a better.
To give an example less close to home. If you are trying to teach kids, don’t just try lots of different teaching styles and see which ones produce the best test results for your tests. Instead have principled reasons for trying a particular teaching style. Try and explain why a teaching style was bad. If you can, create an explicitly bad teaching style and see whether it was as bad as you thought it would be. Once you have got a good model of learning styles to test results, then pick the optimal learning style for principled reasons. Else you could just give the kids the answers before hand, that would optimise the test results.
Does that clarify the difference?
I’ll try and figure out formatting this properly in a bit.
Maybe, I should have expanded on what I mean by a hint. I think I wasn’t clear. It is not the raw context, it is the ability to be spoilered by words. So a hint is non-visual information about the picture, which allows you to locate the object in question.
Does that make things clearer?
But what does it stop them becoming targets, per Goodhearts?
http://www.aclweb.org/anthology/W16-3204
This is unavoidable. If you optimize something, you’re optimizing the thing. Your post suggesting a separation is an attempt at describing something real, but you misunderstood why there’s a separation. You only have one true value function in your low level circuity in your head, and everything else is proxies; so, you have to check things against your value function. By saying “give your subordinates goals”, you are giving them proxies to optimize, which will be goodharted if your subordinates are competent enough. You need to have a white box view of what’s happening to explore and improve the goals you’re giving, until you can find ones that accurately represent your internal values.
In machine learning, this just happens by running a model, trying it on a test set, and giving it a new training objective if it does badly on the thing you actually care about. The training objective is just a proxy, and you test it against a test set to ensure that if it is over-optimizing your proxy—also known as overfitting—then you’ll notice the discrepancy. In value function learning, you’d still do this; you’d have some out-of-sample ethical problems in your test set, and you’d see how your AI does on them. This would be one of the ingredients in making a safe AI based on ML. See “Concrete problems in ai safety”, Amodei 2016
It seems to me that your post can be summarized as “one needs continuous metrics to optimize for, not boolean tests”. Does that seem wrong in any way?
That is not what I was trying to say at all.… Lets try math’s notation. You have
Have a model M which is a function of the state of the world s and the target and hit, you take at time t, a1. The model gives a predicted state of the world at time 2, ps2.
So ps2 = M(s1, a1)
Lets say you have a measure U() over states of the world. Which gives a utility for the state.
U(s1) = u1
You build up a model of what U is Mu so that you don’t have to try to hit every state to find the U of that state. So you have
Mu(s1) = pu1
A normal optimisation process just adjusts a1 until pu2 is maximised. So
f(a1) = argmax Mu(M(s1,a1))
What I’m arguing for is is that you are not getting high U on an poorly defined problem you want to be spending a lot of time adjusting M not adjusting a if you get a poor u.
If your model M is inaccurate that is ps2 != s2 you can improve your pu2 by updating your model to minimise the sqrt of error in the prediction at time t with the action that happened at t and the actual state at t+1 st+1
g(st) = argmin on M (sqrt(M(st,at)-st+1))
So in machine learnings case, your actions a are algorithms or systems you are testing and M is your model of what an intelligence is. If the turing test is giving your system a bad u, and you can’t easily predict the real world intelligences (e.g. humans) with your model of intelligence that is M(st,at)-st+1 is large, update your model of what an intelligence should be doing so that you can better predict what an intelligence should be. This will give you enable you to pick a better.
To give an example less close to home. If you are trying to teach kids, don’t just try lots of different teaching styles and see which ones produce the best test results for your tests. Instead have principled reasons for trying a particular teaching style. Try and explain why a teaching style was bad. If you can, create an explicitly bad teaching style and see whether it was as bad as you thought it would be. Once you have got a good model of learning styles to test results, then pick the optimal learning style for principled reasons. Else you could just give the kids the answers before hand, that would optimise the test results.
Does that clarify the difference?
I’ll try and figure out formatting this properly in a bit.
yeah! I think we’re actually saying the same things back at each other.
I was objecting to the continuous vs boolean distinction :) . I’d boil the article down to.
It is more important to optimise the model of the world , than it is acting in the world, if your model of the world is bad.
It is lucky that this is continuous function though.