Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. Those relationships are “infinitely complex”. But there’s a way to make drastic simplifications of those relationships. We can study the overall (“infinitely complex”) structure of the relationships by studying those simplifications. What do those simplifications do, in general? They put “costs” on versions of a concept.
We can understand how we think if we study our concepts (including values) through such simplifications. It doesn’t matter what concepts we study at all. Anything goes, we just need to choose something convenient. Something objective enough to put numbers on it and come up with models.
Once we’re able to model human concepts this way, we’re able to model human thinking (AGI) and human values (AI Alignment) and improve human thinking.
Context
1.1 Properties of Qualia
There’s the hard problem of consciousness: how is subjective experience created from physical stuff? (Or where does it come from?)
But I’m interested in a more specific question:
Does qualia have properties? What are they?
For example, “How do qualia change? How many different qualia can be created?” or “Do qualia form something akin to a mathematical space, e.g. a vector space? What is this space exactly?”
Is there any knowledge contained in the experience itself, not merely associated with it?1 For example, “cold weather can cause cold (disease)” is a fact associated with experience, but isn’t very fundamental to the experience itself. And this “fact” is even false, it’s a misconception/coincidence.
When you get to know the personality of your friend, do you learn anything “fundamental” or really interesting by itself? Is “loving someone” a fundamentally different experience compared to “eating pizza” or “watching a complicated movie”?
Those questions feel pretty damn important to me! They’re about limitations of your meaningful experience and meaningful knowledge. They’re about personalities of people you know or could know. How many personalities can you differentiate? How “important/fundamental” are those differences? And finally… those questions are about your values.
Those questions are important for Fun Theory. But they’re way more important/fundamental than Fun Theory.
And those questions are important for AI Alignment. If AI can “feel” that loving a sentient being and making a useless paperclip are 2 fundamentally different things, then it might be way easier to explain our values to that AI. By the way, I’m not implying that AI has to have qualia, I’m saying that our qualia can hint us towards the right model.
I think this observation gets a little bit glossed over: if you have a human brain and only care about paperclips… it’s (kind of) still objectively true for you that caring about other people would feel way different, way “bigger” and etc. You can pretend to escape morality, but you can’t escape your brain.
It’s extremely banal out of context, but the landscape of our experiences and concepts may shape the landscape of our values. Modeling our values as arbitrary utility functions (or artifacts of evolution) misses that completely.
2.1 Mystery Boxes
Box A
There’s a mystery Box A. Each day you find a random object inside of it. For example: a ball, a flower, a coin, a wheel, a stick, a tissue...
Box B
There’s also another box, the mystery Box B. One day you find a flower there. Another day you find a knife. The next day you find a toy. Next—a gun. Next—a hat. Next—shark’s jaws...
...
How to understand the boxes? If you could obtain all items from both boxes, you would find… that those items are exactly the same. They just appear in a different order, that’s all.
I think the simplest way to understand Box B is this: you need to approach it with a bias, with a “goal”. For example “things may be dangerous, things may cause negative emotions”. In its most general form, this idea is unfalsifiable and may work as a self-fulfilling prophecy. But this general idea may lead to specific hypotheses, to estimating specific probabilities. This idea may just save your life if someone is coming after you and you need to defend yourself.
Content of both boxes changes in arbitrary ways. But content change of the second box comes with an emotional cost.
There’re many many other boxes, understanding them requires more nuanced biases and goals.
I think those boxes symbolize concepts (e.g. words) and the way humans understand them. I think a human understands a concept by assigning “costs” to its changes of meaning. “Costs” come from various emotions and goals.
“Costs” are convenient: if any change of meaning has a cost, then you don’t need to restrict the meaning of a concept. If a change has a cost, then it’s meaningful regardless of its predictability.
2.2 More Boxes
More examples of mystery boxes:
First box may alternate positive and negative items.
Second box may alternate positive, directly negative and indirectly negative items. For example, it may show you a knife (directly negative) and then a bone (indirectly negative: a “bone” may be a consequence of the “knife”).
Third box may alternate positive, negative and “subverted” items. For example, it may show you a seashell (positive), and then show you shark’s jaws (negative). But both sharks and seashells have a common theme, so “seashell (positive)” got subverted.
Fourth box may alternate negative items and items that “neutralize” negative things. For example, it may show you a sword, but then show you a shield.
Fifth box may show you that every negative thing has many related positive things.
You can imagine a “meta box”, for example a box that alternates between being the 1st box and the 2nd box. Meta boxes can “change their mood”.
I think, in a weird way, all those boxes are very similar to human concepts and words.
The more emotions, goals and biases you learn, the easier it gets for you to understand new boxes. But those “emotions, goals, biases” are themselves like boxes.
I think I have an idea how we could solve AI Alignment, create an AGI with safe and interpretable thinking. I mean a “fundamentally” safe AGI, not a wildcard that requires extremely specific learning to not kill you.
Sorry for a grandiose claim. I’m going to write my idea right away. Then I’m going to explain the context and general examples of it, implications of it being true. Then I’m going to suggest a specific thing we can do. Then I’m going to explain why I believe my idea is true.
My idea will sound too vague and unclear at first. But I think the context will make it clear what I mean. (Clear as the mathematical concept of a graph, for example: a graph is a very abstract idea, but makes sense and easy to use.)
Please evaluate my post at least as science fiction and then ask: maybe it’s not fiction and just reality?
Key points of this post:
You can “solve” human concepts (including values) by solving semantics. By semantics I mean “meaning construction”, something more abstract than language.
Semantics is easier to solve than you think. And we’re closer to solving it than you think.
Semantics is easier to model than you think. You don’t even need an AI to start doing it. Just a special type of statistics. You don’t even have to start with analyzing language.
I believe ideas from this post can be applied outside of AI field.
Why do I believe this? Because of this idea:
Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. You can understand a concept by understanding those internal relationships.
One problem though, those relationships are “infinitely complex”. However, there’s a special way to make drastic simplifications. We can study the real relationships through those special simplifications.
What do those “special simplifications” do? They order versions of a concept (e.g. “version 1, version 2, version 3″). They can do this in extremely arbitrary ways. The important thing is that you can merge arbitrary orders into less arbitrary structures. There’s some rule for it, akin to the Bayes Rule or Occam’s razor. This is what cognition is, according to my theory.
If this is true, we need to find any domain where concepts and their simplifications are easy enough to formalize. Then we need to figure out a model, figure out the rule of merging simplifications. I’ve got a suggestion and a couple of ideas and many examples.
This is a silly, wacky subjective example. I just want to explain the concept.
Here are some meanings of the word “beast”:
(archaic/humorous) any animal.
an inhumanly cruel, violent, or depraved person.
a very skilled human. example: “Magnus Carlsen (chessplayer) is a beast”
something very different and/or hard. example: “Reading modern English is one thing, but understanding Shakespeare is an entirely different beast.”
a person’s brutish or untamed characteristics. example: “The beast in you is rearing its ugly head”
What are the internal relationships between these meanings? If these meanings create a space, where is each of the meanings? I think the full answer is practically unknowable. But we can “probe” the full meaning, we can explore a tiny part of it:
Let’s pick a goal (bias), for example: “describing deep qualities of something/someone”. If you have this goal, the negative meaning (“cruel person”) of the word is the main one for you. Because it can focus on the person’s deep qualities the most, it may imply that the person is rotten to the core. Positive meaning focuses on skills a lot, archaic meaning is just a joke. 4rd meaning doesn’t focus on specific internal qualities. 5th meaning may separate the person from their qualities.
When we added a goal, each meaning started to have a “cost”. This cost illuminates some part of the relationships between the meanings. If we could evaluate an “infinity” of goals, we could know those relationships perfectly. But I believe you can get quite a lot of information by evaluating just a single goal. Because a “goal” is a concept too, so you’re bootstrapping your learning. And I think this matches closely with the example about mystery boxes.
...
By combining a couple of goals we can make an order of the meanings, for example: beast 1 (rotten to the core), beast 2 (skilled and talented person), beast 3 (bad character traits), beast 4 (complicated thing), beast 5 (any animal). This order is based on “specificity” (mostly) and “depth” of a quality: how specific/deep is the characterization?
Another order: beast 1 (not a human), beast 2 (worse than most humans), beast 3 (best among professionals), beast 4 (not some other things), beast 5 (worse than yourself). This order is based on the “scope” and “contrast”: how many things contrast with the object? Notice how each order simplifies and redefines the meanings. But I want to illustrate the process of combining goals/biases on a real order:
2.4 Grammar Rules
You may treat this part of the post as complete fiction. But it illustrates how biases can be combined. And this is the most important thing about biases.
Gramar rules are concepts too. Sometimes people use quite complicated rules without even realizing, for example:
There’s a popular order: opinion, size, physical quality or shape, age, colour, origin, material, purpose. What created this order? I don’t know, but I know that certain biases could make it easier to understand.
Take a look at this part of the order: opinion, age, origin, purpose. You could say all those are not “real” properties. They seem to progress from less related/less specific to the object to more related/specific. If you operate under this bias (relatedness/specificity), swapping the adjectives may lead to funny changes of meaning. For example: “bad old wolf” (objective opinion), “old bad wolf” (intrinsic property or cheesy overblown opinion), “old French bad wolf” (a subspecies of the “French wolf”). You can remember how mystery boxes created meaning using order of items.
Another part of the order: size, physical quality or shape, color, material. You can say all those are “real” physical properties. “Size” could be possessed by a box around the object. “Physical quality” and “shape” could be possessed by something wrapped around the object. “Color” could be possessed by the surface of the object. “Material” can be possessed only by the object itself. So physical qualities progress like layers of an onion.
You can combine those two biases (“relatedness/specificity” + “onion layers”) using a third bias and some minor rules. The third bias may be “attachment”. Some of the rules: (1) an adjective is attached either to some box around the object or to some layer of the object (2) you shouldn’t postulate boxes that are too big. It doesn’t make sense for an opinion to be attached to the object stronger than its size box. It doesn’t make sense for age to be attached to the object stronger than its color (does time pass under the surface layer of an object?). Origin needs to be attached to some layer of the object (otherwise we would need to postulate a giant box that contains both the object and its place of origin). I guess it can’t be attached stronger than “material” because material may expand the information about origin. And purpose is the “soul” of the object. “Attachment” is a reformulation of “relatedness/specificity”, so we only used 2.5 biases to order 8 things. Unnecessary biases just delete themselves.
Of course, this is all still based on complicated human intuitions and high level reasoning. But, I believe, at the heart of it lies a rule as simple as the Bayes Rule or Occam’s razor. A rule about merging arbitrary connections into something less arbitrary.
...
I think stuff like sentence structure/word order (or even morphology) is made of amalgamations of biases too.
Sadly, it’s quite useless to think about it. We don’t have enough orders like this. And we can’t create such orders ourselves (as a game), i.e. we can’t model this, it’s too subjective or too complicated. We have nothing to play with here. But what if we could do all of this for some other topic?
3.1 Argumentation
I believe my idea has some general and specific connections to hypotheses generation and argumentation. The most trivial connection is that hypotheses and arguments use concepts and themselves are concepts.
You don’t need a precisely defined hypothesis if any specification of your hypothesis has a “cost”. You don’t need to prove and disprove specific ideas, you may do something similar to the “gradient descent”. You have a single landscape with all your ideas blended together and you just slide over this landscape. The same goes for arguments: I think it is often sub-optimal to try to come up with a precise argument. Or waste time and atomize your concepts in order to fix any inconsequential “inconsistency”.
A more controversial idea would be that (1) in some cases you can apply wishful thinking, since “wishful thinking” is able to assign emotional “costs” to theories (2) in some cases motivated reasoning is even necessary for thinking. My theory already proposes that meaning/cognition doesn’t exist without motivated reasoning.
Wizardry isn’t as powerful now as it was when Hogwarts was founded.
Hypotheses:
Magic itself is fading.
Wizards are interbreeding with Muggles and Squibs.
Knowledge to cast powerful spells is being lost.
Wizards are eating the wrong foods as children, or something else besides blood is making them grow up weaker.
Muggle technology is interfering with magic. (Since 800 years ago?)
Stronger wizards are having fewer children.
...
You can reformulate the hypotheses in terms of each other, for example:
(1) Magic is fading away. (2) Magic mixes with non-magic. (3) Pieces of magic are lost. (4) Something affects the magic. (5) The same as 2 or 4. (6) Magic creates less magic.
(1) Pieces of magic disappear. (2) ??? (3) Pieces of magic containing spells disappear. (4) Wizards don’t consume/produce enough pieces of magic. (5) Technology destroys pieces of magic. (6) Stronger wizards produce fewer pieces of magic.
Why do this? I think it makes hypotheses less arbitrary and highlights what we really know. And it rises questions that are important across many theories: can magic be split into discrete pieces? can magic “mix” with non-magic? can magic be stronger or weaker? can magic create itself? By the way, those questions would save us from trying to explain a nonexistent phenomenon: maybe magic isn’t even fading in the first place, do we really know this?
3.3 New Occam’s Razor, new probability
And this way hypotheses are easier to order according to our a priori biases. We can order hypotheses exactly the same way we ordered meanings if we reformulate them to sound equivalent to each other. Here’s an example how we can re-order some of the hypotheses:
(1) Pieces of magic disappear by themselves. (2) Pieces of magic containing spells disappear. (3) Wizards don’t consume/produce enough pieces of magic. (4) Stronger wizards produce fewer pieces of magic. (5) Technology destroys pieces of magic.
The hypotheses above are sorted by 3 biases: “Does it describe HOW magic disappears?/Does magic disappear by itself?” (stronger positive weight) and “How general is the reason of the disappearance of magic?” (weaker positive weight) and “novelty compared to other hypotheses” (strong positive weight). “Pieces of magic containing spells disappear” is, in a way, the most specific hypotheses here, but it definitely describes HOW magic disappears (and gives a lot of new information about it), so it’s higher on the list. “Technology destroys pieces of magic” doesn’t give any new information about anything whatsoever, only a specific random possible reason, so it’s the most irrelevant hypothesis here. By the way, those 3 different biases are just different sides of the same coin: “magic described in terms of magic/something else” and “specificity” and “novelty” are all types of “specificity”. Or novelty. Biases are concepts too, you can reformulate any of them in terms of the others too.
When you deal with hypotheses that aren’t “atomized” and specific enough, Occam’s Razor may be impossible to apply. Because complexity of a hypothesis is subjective in such cases. What I described above solves that: complexity is combined with other metrics and evaluated only “locally”. By the way, in a similar fashion you can update the concept of probability. You can split “probability” in multiple connected metrics and use an amalgamation of those metrics in cases where you have absolutely no idea how to calculate the ratio of outcomes.
3.4 “Matrices” of motivation
You can analyze arguments and reasons for actions using the same framework. Imagine this situation:
You are a lonely person on an empty planet. You’re doing physics/math. One day you encounter another person, even though she looks a little bit like a robot. You become friends. One day your friend gets lost in a dangerous forest. Do you risk your life to save her? You come up with some reasons to try to save her:
I care about my friend very much. (A)
If my friend survives, it’s the best outcome for me. (B)
My friend is a real person. (C)
You can explore and evaluate those reasons by formulating them in terms of each other or in other equivalent terms.
“I’m 100% sure I care. (A) Her survival is 90% the best outcome for me in the long run. (B) Probably she’s real (C).” This evaluates the reasons by “power” (basically, probability).
“My feelings are real. (A) The goodness/possibility of the best outcome is real. (B) My friend is probably real. (C)” This evaluates the reasons by “realness”.
“I care 100%. (A) Her survival is 100% the best outcome for me. (B) She’s 100% real. (C).” This evaluates the reasons by “power”strengthened by emotions: what if the power of emotions affects everything else just a tiny bit? By a very small factor.
“Survival of my friend is the best outcome for me. (B) The fact that I ended up caring about my friend is the best thing that happened to me. Physics and math aren’t more interesting than other sentient beings. (A) My friend being real is the best outcome for me. But it isn’t even necessary, she’s already “real” in most of the senses. (C)” This evaluates the reasons by the quality of “being the best outcome”.
Some evaluations may affect others, merge together. I believe the evaluations written above only look like precise considerations, but actually they’re more like meanings of words, impossible to pin down. I gave this example because it’s similar to some of my emotions.
I think such thinking is more natural than applying a pre-existing utility function that doesn’t require any cognition. Utility of what exactly should you calculate? Of your friend’s life? Of your life? Of your life with your friend? Of your life factored by your friend’s desire “be safe, don’t risk your life for me”? Should you take into account change of your personality over time? I believe you can’t learn the difference without working with “meaning”.
4.1 Synesthesia
Imagine a face. When you don’t simplify it, you just see a face and emotions expressed by it. When you simplify it too much, you just see meaningless visual information (geometric shapes and color spots).
But I believe there’s something very interesting in-between. When information is complex enough to start making sense, but isn’t complex enough to fully represent a face. You may see unreal shapes (mixes of “face shapes” and “geometric shapes”… or simplifications of specific face shapes) and unreal emotions (simplifications of specific emotions) and unreal face textures (simplifications of specific face textures).
4.2 Unsupervised learning
Action
If my idea is true, what can we do?
We need to figure out the way to combine biases.
We need to find some objects that are easy to model.
We need to find “simplifications” and “biases” for those objects that are easy to model.
We may start with some absolutely useless objects.
What can we do? (in general)
However, even from made-up examples (not connected to a model) we can be getting some general ideas:
Different versions of a concept always get described in equivalent terms and simplified. (When a “bias” is applied to the concept.)
Multiple biases may turn the concept into something like a matrix?
Sometimes combined biases are similar to a decision tree.
It’s not fictional evidence because at this point we’re not seeking evidence, we’re seeking a way to combine biases.
What specific thing can we do?
I have a topic in mind: (because of my synesthesia-like experiences)
You can analyze shapes of “places” and videogame levels (3D or even 2D shapes) by making orders of their simplifications. You can simplify a place by splitting it into cubes/squares, creating a simplified texture of a place. “Bias” is a specific method of splitting a place into cubes/squares. You can also have a bias for or against creating certain amounts of cubes/squares.
3D and 2D shapes are easy to model.
Splitting a 3D/2D shapes into cubes or squares is easy to model.
Measuring the amount of squares/cubes in an area of a place is easy to model.
Here’s my post about it: “Colors” of places. The post gets specific about the way(s) of evaluating places. I believe it’s specific enough so that we could come up with models. I think this is a real chance.
I probably explained everything badly in that post, but I could explain it better with feedback.
Maybe we could analyze people’s faces the same way, I don’t know if faces are easy enough to model. Maybe “faces” have too complicated shapes.
My evidence
I’ve always had an obsession with other people.
I compared any person I knew to all other people I knew. I tried to remember faces, voices, ways to speak, emotions, situations, media associated with them (books, movies, anime, songs, games).
If I learned something from someone (be it a song or something else), I associated this information with them and remembered the association “forever”. To the point where any experience was associated with someone. Those associations weren’t something static, they were like liquid or gas, tried to occupy all available space.
At some point I knew that they weren’t just “associations” anymore. They turned into synesthesia-like experiences. Like a blind person in a boat, one day I realized that I’m not in a river anymore, I’m in the ocean.
What happened? I think completely arbitrary associations with people where putting emotional “costs” on my experiences. Each arbitrary association was touching on something less arbitrary. When it happened enough times, I believe associations stopped being arbitrary.
“Other people” is the ultimate reason why I think that my idea is true. Often I doubt myself: maybe my memories don’t mean anything? Other times I feel like I didn’t believe in it enough.
...
When a person dies, it’s already maximally sad. You can’t make it more or less sad.
But all this makes it so, so much worse. Imagine if after the death of an author all their characters died too (in their fictional worlds) and memories about the author and their characters died too. Ripples of death just never end and multiply. As if the same stupid thing repeats for the infinith time.
(Drafts of a future post.)
My idea:
Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. Those relationships are “infinitely complex”. But there’s a way to make drastic simplifications of those relationships. We can study the overall (“infinitely complex”) structure of the relationships by studying those simplifications. What do those simplifications do, in general? They put “costs” on versions of a concept.
We can understand how we think if we study our concepts (including values) through such simplifications. It doesn’t matter what concepts we study at all. Anything goes, we just need to choose something convenient. Something objective enough to put numbers on it and come up with models.
Once we’re able to model human concepts this way, we’re able to model human thinking (AGI) and human values (AI Alignment) and improve human thinking.
Context
1.1 Properties of Qualia
There’s the hard problem of consciousness: how is subjective experience created from physical stuff? (Or where does it come from?)
But I’m interested in a more specific question:
Does qualia have properties? What are they?
For example, “How do qualia change? How many different qualia can be created?” or “Do qualia form something akin to a mathematical space, e.g. a vector space? What is this space exactly?”
Is there any knowledge contained in the experience itself, not merely associated with it?1 For example, “cold weather can cause cold (disease)” is a fact associated with experience, but isn’t very fundamental to the experience itself. And this “fact” is even false, it’s a misconception/coincidence.
When you get to know the personality of your friend, do you learn anything “fundamental” or really interesting by itself? Is “loving someone” a fundamentally different experience compared to “eating pizza” or “watching a complicated movie”?
Those questions feel pretty damn important to me! They’re about limitations of your meaningful experience and meaningful knowledge. They’re about personalities of people you know or could know. How many personalities can you differentiate? How “important/fundamental” are those differences? And finally… those questions are about your values.
Those questions are important for Fun Theory. But they’re way more important/fundamental than Fun Theory.
1 Philosophical context for this question: look up Immanuel Kant’s idea of “synthetic a priori” propositions.
1.2 Qualia and morality
And those questions are important for AI Alignment. If AI can “feel” that loving a sentient being and making a useless paperclip are 2 fundamentally different things, then it might be way easier to explain our values to that AI. By the way, I’m not implying that AI has to have qualia, I’m saying that our qualia can hint us towards the right model.
I think this observation gets a little bit glossed over: if you have a human brain and only care about paperclips… it’s (kind of) still objectively true for you that caring about other people would feel way different, way “bigger” and etc. You can pretend to escape morality, but you can’t escape your brain.
It’s extremely banal out of context, but the landscape of our experiences and concepts may shape the landscape of our values. Modeling our values as arbitrary utility functions (or artifacts of evolution) misses that completely.
2.1 Mystery Boxes
Box A
There’s a mystery Box A. Each day you find a random object inside of it. For example: a ball, a flower, a coin, a wheel, a stick, a tissue...
Box B
There’s also another box, the mystery Box B. One day you find a flower there. Another day you find a knife. The next day you find a toy. Next—a gun. Next—a hat. Next—shark’s jaws...
...
How to understand the boxes? If you could obtain all items from both boxes, you would find… that those items are exactly the same. They just appear in a different order, that’s all.
I think the simplest way to understand Box B is this: you need to approach it with a bias, with a “goal”. For example “things may be dangerous, things may cause negative emotions”. In its most general form, this idea is unfalsifiable and may work as a self-fulfilling prophecy. But this general idea may lead to specific hypotheses, to estimating specific probabilities. This idea may just save your life if someone is coming after you and you need to defend yourself.
Content of both boxes changes in arbitrary ways. But content change of the second box comes with an emotional cost.
There’re many many other boxes, understanding them requires more nuanced biases and goals.
I think those boxes symbolize concepts (e.g. words) and the way humans understand them. I think a human understands a concept by assigning “costs” to its changes of meaning. “Costs” come from various emotions and goals.
“Costs” are convenient: if any change of meaning has a cost, then you don’t need to restrict the meaning of a concept. If a change has a cost, then it’s meaningful regardless of its predictability.
2.2 More Boxes
More examples of mystery boxes:
First box may alternate positive and negative items.
Second box may alternate positive, directly negative and indirectly negative items. For example, it may show you a knife (directly negative) and then a bone (indirectly negative: a “bone” may be a consequence of the “knife”).
Third box may alternate positive, negative and “subverted” items. For example, it may show you a seashell (positive), and then show you shark’s jaws (negative). But both sharks and seashells have a common theme, so “seashell (positive)” got subverted.
Fourth box may alternate negative items and items that “neutralize” negative things. For example, it may show you a sword, but then show you a shield.
Fifth box may show you that every negative thing has many related positive things.
You can imagine a “meta box”, for example a box that alternates between being the 1st box and the 2nd box. Meta boxes can “change their mood”.
I think, in a weird way, all those boxes are very similar to human concepts and words.
The more emotions, goals and biases you learn, the easier it gets for you to understand new boxes. But those “emotions, goals, biases” are themselves like boxes.
I think I have an idea how we could solve AI Alignment, create an AGI with safe and interpretable thinking. I mean a “fundamentally” safe AGI, not a wildcard that requires extremely specific learning to not kill you.
Sorry for a grandiose claim. I’m going to write my idea right away. Then I’m going to explain the context and general examples of it, implications of it being true. Then I’m going to suggest a specific thing we can do. Then I’m going to explain why I believe my idea is true.
My idea will sound too vague and unclear at first. But I think the context will make it clear what I mean. (Clear as the mathematical concept of a graph, for example: a graph is a very abstract idea, but makes sense and easy to use.)
Please evaluate my post at least as science fiction and then ask: maybe it’s not fiction and just reality?
Key points of this post:
You can “solve” human concepts (including values) by solving semantics. By semantics I mean “meaning construction”, something more abstract than language.
Semantics is easier to solve than you think. And we’re closer to solving it than you think.
Semantics is easier to model than you think. You don’t even need an AI to start doing it. Just a special type of statistics. You don’t even have to start with analyzing language.
I believe ideas from this post can be applied outside of AI field.
Why do I believe this? Because of this idea:
Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. You can understand a concept by understanding those internal relationships.
One problem though, those relationships are “infinitely complex”. However, there’s a special way to make drastic simplifications. We can study the real relationships through those special simplifications.
What do those “special simplifications” do? They order versions of a concept (e.g. “version 1, version 2, version 3″). They can do this in extremely arbitrary ways. The important thing is that you can merge arbitrary orders into less arbitrary structures. There’s some rule for it, akin to the Bayes Rule or Occam’s razor. This is what cognition is, according to my theory.
If this is true, we need to find any domain where concepts and their simplifications are easy enough to formalize. Then we need to figure out a model, figure out the rule of merging simplifications. I’ve got a suggestion and a couple of ideas and many examples.
Context
2.3 Words
This is a silly, wacky subjective example. I just want to explain the concept.
Here are some meanings of the word “beast”:
(archaic/humorous) any animal.
an inhumanly cruel, violent, or depraved person.
a very skilled human. example: “Magnus Carlsen (chessplayer) is a beast”
something very different and/or hard. example: “Reading modern English is one thing, but understanding Shakespeare is an entirely different beast.”
a person’s brutish or untamed characteristics. example: “The beast in you is rearing its ugly head”
What are the internal relationships between these meanings? If these meanings create a space, where is each of the meanings? I think the full answer is practically unknowable. But we can “probe” the full meaning, we can explore a tiny part of it:
Let’s pick a goal (bias), for example: “describing deep qualities of something/someone”. If you have this goal, the negative meaning (“cruel person”) of the word is the main one for you. Because it can focus on the person’s deep qualities the most, it may imply that the person is rotten to the core. Positive meaning focuses on skills a lot, archaic meaning is just a joke. 4rd meaning doesn’t focus on specific internal qualities. 5th meaning may separate the person from their qualities.
When we added a goal, each meaning started to have a “cost”. This cost illuminates some part of the relationships between the meanings. If we could evaluate an “infinity” of goals, we could know those relationships perfectly. But I believe you can get quite a lot of information by evaluating just a single goal. Because a “goal” is a concept too, so you’re bootstrapping your learning. And I think this matches closely with the example about mystery boxes.
...
By combining a couple of goals we can make an order of the meanings, for example: beast 1 (rotten to the core), beast 2 (skilled and talented person), beast 3 (bad character traits), beast 4 (complicated thing), beast 5 (any animal). This order is based on “specificity” (mostly) and “depth” of a quality: how specific/deep is the characterization?
Another order: beast 1 (not a human), beast 2 (worse than most humans), beast 3 (best among professionals), beast 4 (not some other things), beast 5 (worse than yourself). This order is based on the “scope” and “contrast”: how many things contrast with the object? Notice how each order simplifies and redefines the meanings. But I want to illustrate the process of combining goals/biases on a real order:
2.4 Grammar Rules
You may treat this part of the post as complete fiction. But it illustrates how biases can be combined. And this is the most important thing about biases.
Gramar rules are concepts too. Sometimes people use quite complicated rules without even realizing, for example:
Adjective order or Adjectives: order, video by Tom Scott
There’s a popular order: opinion, size, physical quality or shape, age, colour, origin, material, purpose. What created this order? I don’t know, but I know that certain biases could make it easier to understand.
Take a look at this part of the order: opinion, age, origin, purpose. You could say all those are not “real” properties. They seem to progress from less related/less specific to the object to more related/specific. If you operate under this bias (relatedness/specificity), swapping the adjectives may lead to funny changes of meaning. For example: “bad old wolf” (objective opinion), “old bad wolf” (intrinsic property or cheesy overblown opinion), “old French bad wolf” (a subspecies of the “French wolf”). You can remember how mystery boxes created meaning using order of items.
Another part of the order: size, physical quality or shape, color, material. You can say all those are “real” physical properties. “Size” could be possessed by a box around the object. “Physical quality” and “shape” could be possessed by something wrapped around the object. “Color” could be possessed by the surface of the object. “Material” can be possessed only by the object itself. So physical qualities progress like layers of an onion.
You can combine those two biases (“relatedness/specificity” + “onion layers”) using a third bias and some minor rules. The third bias may be “attachment”. Some of the rules: (1) an adjective is attached either to some box around the object or to some layer of the object (2) you shouldn’t postulate boxes that are too big. It doesn’t make sense for an opinion to be attached to the object stronger than its size box. It doesn’t make sense for age to be attached to the object stronger than its color (does time pass under the surface layer of an object?). Origin needs to be attached to some layer of the object (otherwise we would need to postulate a giant box that contains both the object and its place of origin). I guess it can’t be attached stronger than “material” because material may expand the information about origin. And purpose is the “soul” of the object. “Attachment” is a reformulation of “relatedness/specificity”, so we only used 2.5 biases to order 8 things. Unnecessary biases just delete themselves.
Of course, this is all still based on complicated human intuitions and high level reasoning. But, I believe, at the heart of it lies a rule as simple as the Bayes Rule or Occam’s razor. A rule about merging arbitrary connections into something less arbitrary.
...
I think stuff like sentence structure/word order (or even morphology) is made of amalgamations of biases too.
Sadly, it’s quite useless to think about it. We don’t have enough orders like this. And we can’t create such orders ourselves (as a game), i.e. we can’t model this, it’s too subjective or too complicated. We have nothing to play with here. But what if we could do all of this for some other topic?
3.1 Argumentation
I believe my idea has some general and specific connections to hypotheses generation and argumentation. The most trivial connection is that hypotheses and arguments use concepts and themselves are concepts.
You don’t need a precisely defined hypothesis if any specification of your hypothesis has a “cost”. You don’t need to prove and disprove specific ideas, you may do something similar to the “gradient descent”. You have a single landscape with all your ideas blended together and you just slide over this landscape. The same goes for arguments: I think it is often sub-optimal to try to come up with a precise argument. Or waste time and atomize your concepts in order to fix any inconsequential “inconsistency”.
A more controversial idea would be that (1) in some cases you can apply wishful thinking, since “wishful thinking” is able to assign emotional “costs” to theories (2) in some cases motivated reasoning is even necessary for thinking. My theory already proposes that meaning/cognition doesn’t exist without motivated reasoning.
3.2 Working with hypotheses
A quote from Harry Potter and the Methods of Rationality, Chapter 22: The Scientific Method
Magic itself is fading.
Wizards are interbreeding with Muggles and Squibs.
Knowledge to cast powerful spells is being lost.
Wizards are eating the wrong foods as children, or something else besides blood is making them grow up weaker.
Muggle technology is interfering with magic. (Since 800 years ago?)
Stronger wizards are having fewer children.
...
You can reformulate the hypotheses in terms of each other, for example:
(1) Magic is fading away. (2) Magic mixes with non-magic. (3) Pieces of magic are lost. (4) Something affects the magic. (5) The same as 2 or 4. (6) Magic creates less magic.
(1) Pieces of magic disappear. (2) ??? (3) Pieces of magic containing spells disappear. (4) Wizards don’t consume/produce enough pieces of magic. (5) Technology destroys pieces of magic. (6) Stronger wizards produce fewer pieces of magic.
Why do this? I think it makes hypotheses less arbitrary and highlights what we really know. And it rises questions that are important across many theories: can magic be split into discrete pieces? can magic “mix” with non-magic? can magic be stronger or weaker? can magic create itself? By the way, those questions would save us from trying to explain a nonexistent phenomenon: maybe magic isn’t even fading in the first place, do we really know this?
3.3 New Occam’s Razor, new probability
And this way hypotheses are easier to order according to our a priori biases. We can order hypotheses exactly the same way we ordered meanings if we reformulate them to sound equivalent to each other. Here’s an example how we can re-order some of the hypotheses:
(1) Pieces of magic disappear by themselves. (2) Pieces of magic containing spells disappear. (3) Wizards don’t consume/produce enough pieces of magic. (4) Stronger wizards produce fewer pieces of magic. (5) Technology destroys pieces of magic.
The hypotheses above are sorted by 3 biases: “Does it describe HOW magic disappears?/Does magic disappear by itself?” (stronger positive weight) and “How general is the reason of the disappearance of magic?” (weaker positive weight) and “novelty compared to other hypotheses” (strong positive weight). “Pieces of magic containing spells disappear” is, in a way, the most specific hypotheses here, but it definitely describes HOW magic disappears (and gives a lot of new information about it), so it’s higher on the list. “Technology destroys pieces of magic” doesn’t give any new information about anything whatsoever, only a specific random possible reason, so it’s the most irrelevant hypothesis here. By the way, those 3 different biases are just different sides of the same coin: “magic described in terms of magic/something else” and “specificity” and “novelty” are all types of “specificity”. Or novelty. Biases are concepts too, you can reformulate any of them in terms of the others too.
When you deal with hypotheses that aren’t “atomized” and specific enough, Occam’s Razor may be impossible to apply. Because complexity of a hypothesis is subjective in such cases. What I described above solves that: complexity is combined with other metrics and evaluated only “locally”. By the way, in a similar fashion you can update the concept of probability. You can split “probability” in multiple connected metrics and use an amalgamation of those metrics in cases where you have absolutely no idea how to calculate the ratio of outcomes.
3.4 “Matrices” of motivation
You can analyze arguments and reasons for actions using the same framework. Imagine this situation:
You are a lonely person on an empty planet. You’re doing physics/math. One day you encounter another person, even though she looks a little bit like a robot. You become friends. One day your friend gets lost in a dangerous forest. Do you risk your life to save her? You come up with some reasons to try to save her:
I care about my friend very much. (A)
If my friend survives, it’s the best outcome for me. (B)
My friend is a real person. (C)
You can explore and evaluate those reasons by formulating them in terms of each other or in other equivalent terms.
“I’m 100% sure I care. (A) Her survival is 90% the best outcome for me in the long run. (B) Probably she’s real (C).” This evaluates the reasons by “power” (basically, probability).
“My feelings are real. (A) The goodness/possibility of the best outcome is real. (B) My friend is probably real. (C)” This evaluates the reasons by “realness”.
“I care 100%. (A) Her survival is 100% the best outcome for me. (B) She’s 100% real. (C).” This evaluates the reasons by “power” strengthened by emotions: what if the power of emotions affects everything else just a tiny bit? By a very small factor.
“Survival of my friend is the best outcome for me. (B) The fact that I ended up caring about my friend is the best thing that happened to me. Physics and math aren’t more interesting than other sentient beings. (A) My friend being real is the best outcome for me. But it isn’t even necessary, she’s already “real” in most of the senses. (C)” This evaluates the reasons by the quality of “being the best outcome”.
Some evaluations may affect others, merge together. I believe the evaluations written above only look like precise considerations, but actually they’re more like meanings of words, impossible to pin down. I gave this example because it’s similar to some of my emotions.
I think such thinking is more natural than applying a pre-existing utility function that doesn’t require any cognition. Utility of what exactly should you calculate? Of your friend’s life? Of your life? Of your life with your friend? Of your life factored by your friend’s desire “be safe, don’t risk your life for me”? Should you take into account change of your personality over time? I believe you can’t learn the difference without working with “meaning”.
4.1 Synesthesia
Imagine a face. When you don’t simplify it, you just see a face and emotions expressed by it. When you simplify it too much, you just see meaningless visual information (geometric shapes and color spots).
But I believe there’s something very interesting in-between. When information is complex enough to start making sense, but isn’t complex enough to fully represent a face. You may see unreal shapes (mixes of “face shapes” and “geometric shapes”… or simplifications of specific face shapes) and unreal emotions (simplifications of specific emotions) and unreal face textures (simplifications of specific face textures).
4.2 Unsupervised learning
Action
If my idea is true, what can we do?
We need to figure out the way to combine biases.
We need to find some objects that are easy to model.
We need to find “simplifications” and “biases” for those objects that are easy to model.
We may start with some absolutely useless objects.
What can we do? (in general)
However, even from made-up examples (not connected to a model) we can be getting some general ideas:
Different versions of a concept always get described in equivalent terms and simplified. (When a “bias” is applied to the concept.)
Multiple biases may turn the concept into something like a matrix?
Sometimes combined biases are similar to a decision tree.
It’s not fictional evidence because at this point we’re not seeking evidence, we’re seeking a way to combine biases.
What specific thing can we do?
I have a topic in mind: (because of my synesthesia-like experiences)
You can analyze shapes of “places” and videogame levels (3D or even 2D shapes) by making orders of their simplifications. You can simplify a place by splitting it into cubes/squares, creating a simplified texture of a place. “Bias” is a specific method of splitting a place into cubes/squares. You can also have a bias for or against creating certain amounts of cubes/squares.
3D and 2D shapes are easy to model.
Splitting a 3D/2D shapes into cubes or squares is easy to model.
Measuring the amount of squares/cubes in an area of a place is easy to model.
Here’s my post about it: “Colors” of places. The post gets specific about the way(s) of evaluating places. I believe it’s specific enough so that we could come up with models. I think this is a real chance.
I probably explained everything badly in that post, but I could explain it better with feedback.
Maybe we could analyze people’s faces the same way, I don’t know if faces are easy enough to model. Maybe “faces” have too complicated shapes.
My evidence
I’ve always had an obsession with other people.
I compared any person I knew to all other people I knew. I tried to remember faces, voices, ways to speak, emotions, situations, media associated with them (books, movies, anime, songs, games).
If I learned something from someone (be it a song or something else), I associated this information with them and remembered the association “forever”. To the point where any experience was associated with someone. Those associations weren’t something static, they were like liquid or gas, tried to occupy all available space.
At some point I knew that they weren’t just “associations” anymore. They turned into synesthesia-like experiences. Like a blind person in a boat, one day I realized that I’m not in a river anymore, I’m in the ocean.
What happened? I think completely arbitrary associations with people where putting emotional “costs” on my experiences. Each arbitrary association was touching on something less arbitrary. When it happened enough times, I believe associations stopped being arbitrary.
“Other people” is the ultimate reason why I think that my idea is true. Often I doubt myself: maybe my memories don’t mean anything? Other times I feel like I didn’t believe in it enough.
...
When a person dies, it’s already maximally sad. You can’t make it more or less sad.
But all this makes it so, so much worse. Imagine if after the death of an author all their characters died too (in their fictional worlds) and memories about the author and their characters died too. Ripples of death just never end and multiply. As if the same stupid thing repeats for the infinith time.
Updated the post (2).