What you’re referring to are called ‘features’ in the machine intelligence community. A ‘feature’ is usually an easily-computable property of an input data point that provides some information as to what class the input data point corresponds to. Models are then built by specifying a probability distribution over features. An optimal feature is one that gives the maximal amount of information after all previous features have been taken into account. For instance, if a feature takes one bit to describe, it will be optimal if, considering all other features have been taken into account, it provides one bit of information (or, equivalently, partitions the probability distribution exactly in half). If a feature is not optimal, there are various measures for determining how optimal it is and if it should be used or not.
Ordinary reasoning in mathematics is just a special case of Bayesian reasoning, as has been pointed out numerous times in the sequences. There has been a lot of work on optimal feature selection and how to derive good features (for example, using the Bayesian Information Criterion or BIC). It might be useful to extend your idea to incorporate those developments.
What you’re referring to are called ‘features’ in the machine intelligence community. A ‘feature’ is usually an easily-computable property of an input data point that provides some information as to what class the input data point corresponds to. Models are then built by specifying a probability distribution over features. An optimal feature is one that gives the maximal amount of information after all previous features have been taken into account. For instance, if a feature takes one bit to describe, it will be optimal if, considering all other features have been taken into account, it provides one bit of information (or, equivalently, partitions the probability distribution exactly in half). If a feature is not optimal, there are various measures for determining how optimal it is and if it should be used or not.
Ordinary reasoning in mathematics is just a special case of Bayesian reasoning, as has been pointed out numerous times in the sequences. There has been a lot of work on optimal feature selection and how to derive good features (for example, using the Bayesian Information Criterion or BIC). It might be useful to extend your idea to incorporate those developments.