I didn’t understand everything completely, however when you mentioned “relevance realization” it reminded me of a recent post which gave labels to data which was of different levels of usefulness. The exact labels he used isn’t particularly important, but the categories that they represent are extremely useful. He outlines, more or less, that there are three types of information:
1. Unsorted, unfiltered data. Without a way to discriminate the signal from the noise, this data is basically useless. One example he gives are unsorted error logs for a given computer.
2. Highly relevant, well processed data. When the raw data is filtered, modified, and processed into a hyper usable form, that’s when it is it’s most useful. The ways to slice and dice the data are subjective in the sense that it depends on the goals of the user.
3. Misleading or incorrect data. Some data may be correct, but misleading. The example given is a ticket created because a “website loaded slowly”. Because this ticket was submitted, a lot of time may be taken to determine what may be wrong with a server. However, it turns out that the page loaded slowly because it was accessed via an old machine!
I wonder if the two of you might be interested in exploring these concepts together. Or apologies if I misunderstood!
Hm, I’ll give some thought to how to integrate different types of data with this picture, but I think that the “useful” classification of data ultimately depends on whether the agent possesses the right “key” to interpret it, and by extension, how difficult that “key” is to produce from concepts that the agent is already proficient with.
At the end of the day, the agent can only “understand” any data in terms of internalized concepts, so there will often be some uncertainty whether the difficulty is in translating sensible data into that internal representation or the difficulty is in the data being about phenomena somewhat (or far) outside of conceptual bounds. Having a “key” with respect to some data means that this data can be reliably translated.
Referential containment is about how to structure internal representations of concepts such as to make them most useful (which could mean flexible, or it could mean efficient, etc, depending on the problem domain and constraints).
If humanity forgot all of its medical knowledge tomorrow, would we discover the same categories and sub-categories of medicine, structuring our knowledge and drawing the distinctions into the different areas of expertise similarly? We could gather a bunch of observational data about “the art of preventing people from dying” and find clusters in interventions or treatment strategies that have appropriate conceptual size to teach to different groups of people. Note that this changes depending on how many people we can allocate in total, how much we believe reliably fits into a single human mind, and how many common features there are in curative or preventative measures (commonalities here roughly referring to useful information that is referentially contained with respect to medicine, but can not be further contained in a single sub-field or small cluster thereof).
I think I don’t have the correct background to understand fully. However, I think it makes a little more sense than when I originally read it.
An analogue to what you’re talking about (referential containment) with the medical knowledge would be something like PCA (principle component analysis) in genomics, right? Just at a much higher, autonomous level.
Yeah, I am not super familiar with PCA, but my understanding is that while both PCA and referential containment can be used to extract lower-dimensional or more compact representations, they operate on different types of data structures (feature vectors vs. graphs/hypergraphs) and have different objectives (capturing maximum variance vs. identifying self-contained conceptual chunks). Referential containment is more focused on finding semantically meaningful and contextually relevant substructures within a causal or relational knowledge representation. It also tries to address the opposite direction, basically how to break existing concepts apart when zooming into the representations, and I am not sure if something like that is done with PCA.
I had Claude 3 read this post and compare the two. Here it is, if you are interested (keep in mind that Claude tends to be very friendly and encouraging, so it might be valuing referential containment too highly): Similarities:
Both aim to identify patterns and structure in high-dimensional data.
Both can be used to reduce the dimensionality of data by finding lower-dimensional representations.
Both rely on analyzing statistical dependencies or correlations between variables/features.
Differences:
PCA is a linear dimensionality reduction technique that projects data onto a lower-dimensional subspace spanned by the principal components (eigenvectors of the covariance matrix). Referential containment is not inherently a dimensionality reduction technique but rather a way to identify self-contained or tightly coupled substructures within a causal graph or hypergraph representation.
PCA operates on numerical feature vectors, while referential containment deals with graph/hypergraph structures representing conceptual relationships and causal dependencies.
PCA finds orthogonal linear combinations of features that capture maximum variance. Referential containment aims to identify subgraphs with strong internal connections and minimal external connections, which may not necessarily correspond to maximum variance directions.
PCA is an unsupervised technique that does not consider semantics or labeled data. Referential containment, as proposed in the context of the PSCA architecture, can potentially leverage semantic information and learned representations to guide the identification of meaningful conceptual chunks.
PCA produces a fixed set of principal components for a given dataset. Referential containment, as described, is more flexible and can adapt the level of granularity or resolution based on context, objectives, and resource constraints.
I didn’t understand everything completely, however when you mentioned “relevance realization” it reminded me of a recent post which gave labels to data which was of different levels of usefulness. The exact labels he used isn’t particularly important, but the categories that they represent are extremely useful. He outlines, more or less, that there are three types of information:
1. Unsorted, unfiltered data. Without a way to discriminate the signal from the noise, this data is basically useless. One example he gives are unsorted error logs for a given computer.
2. Highly relevant, well processed data. When the raw data is filtered, modified, and processed into a hyper usable form, that’s when it is it’s most useful. The ways to slice and dice the data are subjective in the sense that it depends on the goals of the user.
3. Misleading or incorrect data. Some data may be correct, but misleading. The example given is a ticket created because a “website loaded slowly”. Because this ticket was submitted, a lot of time may be taken to determine what may be wrong with a server. However, it turns out that the page loaded slowly because it was accessed via an old machine!
I wonder if the two of you might be interested in exploring these concepts together. Or apologies if I misunderstood!
Hm, I’ll give some thought to how to integrate different types of data with this picture, but I think that the “useful” classification of data ultimately depends on whether the agent possesses the right “key” to interpret it, and by extension, how difficult that “key” is to produce from concepts that the agent is already proficient with.
At the end of the day, the agent can only “understand” any data in terms of internalized concepts, so there will often be some uncertainty whether the difficulty is in translating sensible data into that internal representation or the difficulty is in the data being about phenomena somewhat (or far) outside of conceptual bounds. Having a “key” with respect to some data means that this data can be reliably translated.
Referential containment is about how to structure internal representations of concepts such as to make them most useful (which could mean flexible, or it could mean efficient, etc, depending on the problem domain and constraints).
If humanity forgot all of its medical knowledge tomorrow, would we discover the same categories and sub-categories of medicine, structuring our knowledge and drawing the distinctions into the different areas of expertise similarly?
We could gather a bunch of observational data about “the art of preventing people from dying” and find clusters in interventions or treatment strategies that have appropriate conceptual size to teach to different groups of people. Note that this changes depending on how many people we can allocate in total, how much we believe reliably fits into a single human mind, and how many common features there are in curative or preventative measures (commonalities here roughly referring to useful information that is referentially contained with respect to medicine, but can not be further contained in a single sub-field or small cluster thereof).
Let me know if that makes sense.
I think I don’t have the correct background to understand fully. However, I think it makes a little more sense than when I originally read it.
An analogue to what you’re talking about (referential containment) with the medical knowledge would be something like PCA (principle component analysis) in genomics, right? Just at a much higher, autonomous level.
Yeah, I am not super familiar with PCA, but my understanding is that while both PCA and referential containment can be used to extract lower-dimensional or more compact representations, they operate on different types of data structures (feature vectors vs. graphs/hypergraphs) and have different objectives (capturing maximum variance vs. identifying self-contained conceptual chunks). Referential containment is more focused on finding semantically meaningful and contextually relevant substructures within a causal or relational knowledge representation. It also tries to address the opposite direction, basically how to break existing concepts apart when zooming into the representations, and I am not sure if something like that is done with PCA.
I had Claude 3 read this post and compare the two. Here it is, if you are interested (keep in mind that Claude tends to be very friendly and encouraging, so it might be valuing referential containment too highly):
Similarities:
Both aim to identify patterns and structure in high-dimensional data.
Both can be used to reduce the dimensionality of data by finding lower-dimensional representations.
Both rely on analyzing statistical dependencies or correlations between variables/features.
Differences:
PCA is a linear dimensionality reduction technique that projects data onto a lower-dimensional subspace spanned by the principal components (eigenvectors of the covariance matrix). Referential containment is not inherently a dimensionality reduction technique but rather a way to identify self-contained or tightly coupled substructures within a causal graph or hypergraph representation.
PCA operates on numerical feature vectors, while referential containment deals with graph/hypergraph structures representing conceptual relationships and causal dependencies.
PCA finds orthogonal linear combinations of features that capture maximum variance. Referential containment aims to identify subgraphs with strong internal connections and minimal external connections, which may not necessarily correspond to maximum variance directions.
PCA is an unsupervised technique that does not consider semantics or labeled data. Referential containment, as proposed in the context of the PSCA architecture, can potentially leverage semantic information and learned representations to guide the identification of meaningful conceptual chunks.
PCA produces a fixed set of principal components for a given dataset. Referential containment, as described, is more flexible and can adapt the level of granularity or resolution based on context, objectives, and resource constraints.