You can use the same thinking to analyze/classify puzzles.
Inspired by Pirates of the Caribbean: Dead Man’s Chest. Jack has a compass that can lead him to a thing he desires. Jack wants to find a key. Jack can have those experiences:
Experience of the real key.
Experience of a drawing of the key.
Pure desire for the key.
In order for compass to work Jack may need (almost) any mix of those: for example, maybe pure desire is enough for the compass to work. But maybe you need to mix pure desire with seeing at least a drawing of the key (so you have more of a picture of what you want).
Gibbs:And whatever this key unlocks, inside there’s something valuable. So, we’re setting out to find whatever this key unlocks!
Jack:No! If we don’t have the key, we can’t open whatever it is we don’t have that it unlocks. So what purpose would be served in finding whatever need be unlocked, which we don’t have, without first having found the key what unlocks it?
Gibbs:So—We’re going after this key!
Jack:You’re not making any sense at all.
Gibbs:???
Jack has those possibilities:
To go after the chest. Foolish: you can’t open the chest.
To go after the key. Foolish: you can get caught by Davy Jones.
Gibbs thinks about doing 100% of 1 or 100% of 2 and gets confused when he learns that’s not the plan. Jack thinks about 50% of 1 and 50% of 2: you can go after the chest in order to use it to get the key. Or you can go after the chest and the key “simultaneously” in order to keep Davy Jones distracted and torn between two things.
Now you need 50% of 1 and 25% of 2: you need to rewind time while the platform moves. In this time-manipulating world outcomes may not add up to 100% since you can erase or multiply some of the outcomes/move outcomes from one timeline to another.
Argumentation
You can use the same thing to analyze arguments and opinions. Our opinions are built upon thousands and thousands “false dilemmas” that we haven’t carefully revised.
For example, take a look at those contradicting opinions:
Humans are smart. Sometimes in very non-obvious ways.
Humans are stupid. They make a lot of mistakes.
Usually people think you have to believe either “100% for 1” or “100% for 2″. But you can believe in all kinds of mixes.
For example, I believe in 90% of 1 and 10% of 2: people may be “stupid” in this particular nonsensical world, but in a better world everyone would be a genius.
Ideas as bits
You can treat an idea as a “(quasi)probability distribution” over some levels of a problem/topic. Each detail of the idea gives you a hint about the shape of the distribution. (Each detail is a bit of information.)
We usually don’t analyze information like this. Instead of cautiously updating our understanding with every detail of an idea we do this:
try to grab all details together
get confused (like Gibbs)
throw most of the details out and end up with an obviously wrong understanding.
Note: maybe you can apply the same idea about “bits” to chess (and other games). Each idea and each small advantage you need to come with the winning plan is a “bit” of information/advantage. Before you get enough information/advantage bits the positions looks like a cloud where you don’t see what to do.
Richness of ideas
I think you can measure “richness” of theories (and opinions and anything else) using the same quasiprobabilities/bits. But this measure depends on what you want.
Compare those 2 theories explaining different properties of objects:
(A) Objects have different properties because they have different combinations of “proto properties”.
(B) Objects have different properties because they have different organization of atoms.
Let’s add a metric to compare 2 theories:
Does the theory explain why objects exist in the first place?
Does the theory explain why objects have certain properties?
Let’s say we’re interested in physical objects. B-theory explains properties through 90% of 1 and 10% of 2: it makes properties of objects equivalent to the reason of their existence. A-theory explains properties through 100% of 2. B-theory is more fundamental, because it touches more on a more fundamental topic (existence).
But if we’re interested in mental objects… B-theory explains only 10% of 2 and 0% of 1. And A-theory may be explaining 99% of 1. If our interests are different A-theory turns out to be more fundamental.
When you look for a theory (or opinion or anything else), you can treat any desire and argument as a “bit” that updates the quasiprobabilities like the ones above.
Discussion
We could help each other to find gaps in our thinking! We could do this in this thread.
Gaps of Alignment
I want to explain what I perceive as missed ideas in Alignment. And discuss some other ideas.
(1) You can split possible effects of AI’s actions into three domains. All of them are different (with different ideas), even though they partially intersect and can be formulated in terms of each other. Traditionally we focus on the first two domains:
(Not) accomplishing a goal. “Utility functions” are about this.
(Not) violating human values. “Value learning” is about this.
(Not) modifying a system without breaking it. (Not) doing a task in an obviously meaningless way. “Impact measures” are about this.
I think third domain is mostly ignored and it’s a big blind spot.
I believe that “human (meta-)ethics” is just a subset of a way broader topic: “properties of (any) systems”. And we can translate the method of learning properties of simple systems into a method of learning human values (a complicated system). And we can translate results of learning those simple systems into human moral rules. And many important complicated properties (such as “corrigibility”) has analogies in simple systems.
(2) Another “missed idea”:
Some people analyze human values as a random thing (random utility function).
Some people analyze human values as a result of evolution.
Some analyze human values as a result of people’s childhoods.
Not a lot of people analyze human values as… a result of the way humans experience the world.
“True Love(TM) towards a sentient being” feels fundamentally different from “eating a sandwich”, so it could be evidence that human experiences have an internal structure and that structure plays a big role in determining values. But not a lot of models (or simply 0) take this “fact” into account. Not surprisingly, though: it would require a theory of human subjective experience. But still, can we just ignore this “fact”?
There’s the idea of biological or artificial neurons.
(gap)
There’s the idea that communication between humans is like communication between neurons.
I think one layer of the idea is missing: you could say that concepts in the human mind are somewhat like neurons. Maybe human thinking is like a fractal, looks the same on all levels.
You can describe possible outcomes (microscopic things) in terms of each other. Using Bayes’ rule.
I think this idea should have a “counterpart”: maybe you can describe macroscopic things in terms of each other. And not only outcomes. Using something somewhat similar to probabilistic reasoning, to Bayes’ rule.
Puzzles
You can use the same thinking to analyze/classify puzzles.
Inspired by Pirates of the Caribbean: Dead Man’s Chest. Jack has a compass that can lead him to a thing he desires. Jack wants to find a key. Jack can have those experiences:
Experience of the real key.
Experience of a drawing of the key.
Pure desire for the key.
In order for compass to work Jack may need (almost) any mix of those: for example, maybe pure desire is enough for the compass to work. But maybe you need to mix pure desire with seeing at least a drawing of the key (so you have more of a picture of what you want).
Gibbs: And whatever this key unlocks, inside there’s something valuable. So, we’re setting out to find whatever this key unlocks!
Jack: No! If we don’t have the key, we can’t open whatever it is we don’t have that it unlocks. So what purpose would be served in finding whatever need be unlocked, which we don’t have, without first having found the key what unlocks it?
Gibbs: So—We’re going after this key!
Jack: You’re not making any sense at all.
Gibbs: ???
Jack has those possibilities:
To go after the chest. Foolish: you can’t open the chest.
To go after the key. Foolish: you can get caught by Davy Jones.
Gibbs thinks about doing 100% of 1 or 100% of 2 and gets confused when he learns that’s not the plan. Jack thinks about 50% of 1 and 50% of 2: you can go after the chest in order to use it to get the key. Or you can go after the chest and the key “simultaneously” in order to keep Davy Jones distracted and torn between two things.
Braid, Puzzle 1 (“The Ground Beneath Her Feet”). You have two options:
Ignore the platform.
Move the platform.
You need 50% of 1 and 50% of 2: first you ignore the platform, then you move the platform… and rewind time to mix the options.
Braid, Puzzle 2 (“A Tingling”). You have the same two options:
Ignore the platform.
Move the platform.
Now you need 50% of 1 and 25% of 2: you need to rewind time while the platform moves. In this time-manipulating world outcomes may not add up to 100% since you can erase or multiply some of the outcomes/move outcomes from one timeline to another.
Argumentation
You can use the same thing to analyze arguments and opinions. Our opinions are built upon thousands and thousands “false dilemmas” that we haven’t carefully revised.
For example, take a look at those contradicting opinions:
Humans are smart. Sometimes in very non-obvious ways.
Humans are stupid. They make a lot of mistakes.
Usually people think you have to believe either “100% for 1” or “100% for 2″. But you can believe in all kinds of mixes.
For example, I believe in 90% of 1 and 10% of 2: people may be “stupid” in this particular nonsensical world, but in a better world everyone would be a genius.
Ideas as bits
You can treat an idea as a “(quasi)probability distribution” over some levels of a problem/topic. Each detail of the idea gives you a hint about the shape of the distribution. (Each detail is a bit of information.)
We usually don’t analyze information like this. Instead of cautiously updating our understanding with every detail of an idea we do this:
try to grab all details together
get confused (like Gibbs)
throw most of the details out and end up with an obviously wrong understanding.
Note: maybe you can apply the same idea about “bits” to chess (and other games). Each idea and each small advantage you need to come with the winning plan is a “bit” of information/advantage. Before you get enough information/advantage bits the positions looks like a cloud where you don’t see what to do.
Richness of ideas
I think you can measure “richness” of theories (and opinions and anything else) using the same quasiprobabilities/bits. But this measure depends on what you want.
Compare those 2 theories explaining different properties of objects:
(A) Objects have different properties because they have different combinations of “proto properties”.
(B) Objects have different properties because they have different organization of atoms.
Let’s add a metric to compare 2 theories:
Does the theory explain why objects exist in the first place?
Does the theory explain why objects have certain properties?
Let’s say we’re interested in physical objects. B-theory explains properties through 90% of 1 and 10% of 2: it makes properties of objects equivalent to the reason of their existence. A-theory explains properties through 100% of 2. B-theory is more fundamental, because it touches more on a more fundamental topic (existence).
But if we’re interested in mental objects… B-theory explains only 10% of 2 and 0% of 1. And A-theory may be explaining 99% of 1. If our interests are different A-theory turns out to be more fundamental.
When you look for a theory (or opinion or anything else), you can treat any desire and argument as a “bit” that updates the quasiprobabilities like the ones above.
Discussion
We could help each other to find gaps in our thinking! We could do this in this thread.
Gaps of Alignment
I want to explain what I perceive as missed ideas in Alignment. And discuss some other ideas.
(1) You can split possible effects of AI’s actions into three domains. All of them are different (with different ideas), even though they partially intersect and can be formulated in terms of each other. Traditionally we focus on the first two domains:
(Not) accomplishing a goal. “Utility functions” are about this.
(Not) violating human values. “Value learning” is about this.
(Not) modifying a system without breaking it. (Not) doing a task in an obviously meaningless way. “Impact measures” are about this.
I think third domain is mostly ignored and it’s a big blind spot.
I believe that “human (meta-)ethics” is just a subset of a way broader topic: “properties of (any) systems”. And we can translate the method of learning properties of simple systems into a method of learning human values (a complicated system). And we can translate results of learning those simple systems into human moral rules. And many important complicated properties (such as “corrigibility”) has analogies in simple systems.
(2) Another “missed idea”:
Some people analyze human values as a random thing (random utility function).
Some people analyze human values as a result of evolution.
Some analyze human values as a result of people’s childhoods.
Not a lot of people analyze human values as… a result of the way humans experience the world.
“True Love(TM) towards a sentient being” feels fundamentally different from “eating a sandwich”, so it could be evidence that human experiences have an internal structure and that structure plays a big role in determining values. But not a lot of models (or simply 0) take this “fact” into account. Not surprisingly, though: it would require a theory of human subjective experience. But still, can we just ignore this “fact”?
(3) Preference utilitarianism says:
You can describe entire ethics by a (weighted) aggregation of a single microscopic value. This microscopic values is called “preference”.
I think there’s a missed idea: you could try to describe entire ethics by a weighted aggregation of a single… macroscopic value.
(4) Connectionism and Connectivism. I think this is a good example of a gap in our knowledge:
There’s the idea of biological or artificial neurons.
(gap)
There’s the idea that communication between humans is like communication between neurons.
I think one layer of the idea is missing: you could say that concepts in the human mind are somewhat like neurons. Maybe human thinking is like a fractal, looks the same on all levels.
(5) Bayesian probability. There’s an idea:
You can describe possible outcomes (microscopic things) in terms of each other. Using Bayes’ rule.
I think this idea should have a “counterpart”: maybe you can describe macroscopic things in terms of each other. And not only outcomes. Using something somewhat similar to probabilistic reasoning, to Bayes’ rule.
That’s what I tried to do in this post.