Eliezer, compare giving a probability distribution over the feature vector of a particular apple (giving a value for each feature), versus a probability distribution over the vector of vectors that describes the features of each apple in the set “all the apples in the world.” Surely the second vector has more info, in an entropy or any other sense.
Eliezer, compare giving a probability distribution over the feature vector of a particular apple (giving a value for each feature), versus a probability distribution over the vector of vectors that describes the features of each apple in the set “all the apples in the world.” Surely the second vector has more info, in an entropy or any other sense.
It certainly has more predictive power for future apples!