I’m looking for some concept which I am sure has been talked about before in stats but I’m not sure of the technical term for it.
Lets say you have a function you are trying to guess with a certain range and domain. How would you talk about the amount of data you would need to likely get the actual functions with noisy data? My current thoughts are the larger the cardinality of the domain the more data you would need (in a simple relationship) and the type of noise would determine how much the size of the range would affect the amount of data you would need.
How would you talk about the amount of data you would need to likely get the actual functions with noisy data?
First, I would specify what set my ‘function’ is in. Are there 2 possibilities? 10? A million? log2(x) tells me how many bits of information I need. Then I would treat the data as coming to me through a noisy channel. How noisy? I assume you already know how noisy. Now I can plug in the noise level to Shannon’s theorem, and that tells me how many noisy bits I need to get my log2(x) bits.
(This all seems like very layman information theory, which makes me wonder if something makes your problem harder than this.)
By data I meant training data (in a machine learning context), not information.
And it wasn’t really the math I was after, it is quite simple, just whether it had been discussed before.
My thoughts on the math. If the cardinality of the input is X and output is Y then the bound for the space of functions you are exploring is by X^Y. E.G. there are 4 possible functions from 1 binary bit to another. (Set to 0, Set to 1, Invert, Keep the Same). I’ve come across this in simple category theory.
However in order to fully specify which function it is (assuming no noise). You need a minimum of 2 pieces of training data (where training data is input ouput pairs). If you have the training pair (0,0) you don’t know if that means keep the same or set to 0. Fairly obviously you need as many unique samples of training data as the cardinality of the domain. You need more when you have noise.
This is a less efficient way of getting information about functions, than just getting the Turing number or similar.
So I’m really wondering if there are guidelines for people designing machine learning systems, that I am missing. So if you know you can only get 5000 training examples, you know you that a system which tries to learn from the entire space of functions with the domain much larger that 5000 is not going to be very accurate unless you have put a lot of information into the prior/hypothesis space.
The closest thing I can think of is identifiability, although that’s more about whether it’s possible to identify a function given an arbitrarily large amount of data.
I’m looking for some concept which I am sure has been talked about before in stats but I’m not sure of the technical term for it.
Lets say you have a function you are trying to guess with a certain range and domain. How would you talk about the amount of data you would need to likely get the actual functions with noisy data? My current thoughts are the larger the cardinality of the domain the more data you would need (in a simple relationship) and the type of noise would determine how much the size of the range would affect the amount of data you would need.
First, I would specify what set my ‘function’ is in. Are there 2 possibilities? 10? A million? log2(x) tells me how many bits of information I need. Then I would treat the data as coming to me through a noisy channel. How noisy? I assume you already know how noisy. Now I can plug in the noise level to Shannon’s theorem, and that tells me how many noisy bits I need to get my log2(x) bits.
(This all seems like very layman information theory, which makes me wonder if something makes your problem harder than this.)
By data I meant training data (in a machine learning context), not information.
And it wasn’t really the math I was after, it is quite simple, just whether it had been discussed before.
My thoughts on the math. If the cardinality of the input is X and output is Y then the bound for the space of functions you are exploring is by X^Y. E.G. there are 4 possible functions from 1 binary bit to another. (Set to 0, Set to 1, Invert, Keep the Same). I’ve come across this in simple category theory.
However in order to fully specify which function it is (assuming no noise). You need a minimum of 2 pieces of training data (where training data is input ouput pairs). If you have the training pair (0,0) you don’t know if that means keep the same or set to 0. Fairly obviously you need as many unique samples of training data as the cardinality of the domain. You need more when you have noise.
This is a less efficient way of getting information about functions, than just getting the Turing number or similar.
So I’m really wondering if there are guidelines for people designing machine learning systems, that I am missing. So if you know you can only get 5000 training examples, you know you that a system which tries to learn from the entire space of functions with the domain much larger that 5000 is not going to be very accurate unless you have put a lot of information into the prior/hypothesis space.
The closest thing I can think of is identifiability, although that’s more about whether it’s possible to identify a function given an arbitrarily large amount of data.
Hmm, not quite what I was looking for but interesting none the less.
Thanks.