I think I have an idea how we could solve AI Alignment, create an AGI with safe and interpretable thinking. I mean a “fundamentally” safe AGI, not a wildcard that requires extremely specific learning to not kill you.
Sorry for a grandiose claim. I’m going to write my idea right away. Then I’m going to explain the context and general examples of it, implications of it being true. Then I’m going to suggest a specific thing we can do. Then I’m going to explain why I believe my idea is true.
My idea will sound too vague and unclear at first. But I think the context will make it clear what I mean. (Clear as the mathematical concept of a graph, for example: a graph is a very abstract idea, but makes sense and easy to use.)
Please evaluate my post at least as science fiction and then ask: maybe it’s not fiction and just reality?
Key points of this post:
You can “solve” human concepts (including values) by solving semantics. By semantics I mean “meaning construction”, something more abstract than language.
Semantics is easier to solve than you think. And we’re closer to solving it than you think.
Semantics is easier to model than you think. You don’t even need an AI to start doing it. Just a special type of statistics. You don’t even have to start with analyzing language.
I believe ideas from this post can be applied outside of AI field.
Why do I believe this? Because of this idea:
Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. You can understand a concept by understanding those internal relationships.
One problem though, those relationships are “infinitely complex”. However, there’s a special way to make drastic simplifications. We can study the real relationships through those special simplifications.
What do those “special simplifications” do? They order versions of a concept (e.g. “version 1, version 2, version 3″). They can do this in extremely arbitrary ways. The important thing is that you can merge arbitrary orders into less arbitrary structures. There’s some rule for it, akin to the Bayes Rule or Occam’s razor. This is what cognition is, according to my theory.
If this is true, we need to find any domain where concepts and their simplifications are easy enough to formalize. Then we need to figure out a model, figure out the rule of merging simplifications. I’ve got a suggestion and a couple of ideas and many examples.
I think I have an idea how we could solve AI Alignment, create an AGI with safe and interpretable thinking. I mean a “fundamentally” safe AGI, not a wildcard that requires extremely specific learning to not kill you.
Sorry for a grandiose claim. I’m going to write my idea right away. Then I’m going to explain the context and general examples of it, implications of it being true. Then I’m going to suggest a specific thing we can do. Then I’m going to explain why I believe my idea is true.
My idea will sound too vague and unclear at first. But I think the context will make it clear what I mean. (Clear as the mathematical concept of a graph, for example: a graph is a very abstract idea, but makes sense and easy to use.)
Please evaluate my post at least as science fiction and then ask: maybe it’s not fiction and just reality?
Key points of this post:
You can “solve” human concepts (including values) by solving semantics. By semantics I mean “meaning construction”, something more abstract than language.
Semantics is easier to solve than you think. And we’re closer to solving it than you think.
Semantics is easier to model than you think. You don’t even need an AI to start doing it. Just a special type of statistics. You don’t even have to start with analyzing language.
I believe ideas from this post can be applied outside of AI field.
Why do I believe this? Because of this idea:
Every concept (or even random mishmash of ideas) has multiple versions. Those versions have internal relationships, positions in some space relative to each other. You can understand a concept by understanding those internal relationships.
One problem though, those relationships are “infinitely complex”. However, there’s a special way to make drastic simplifications. We can study the real relationships through those special simplifications.
What do those “special simplifications” do? They order versions of a concept (e.g. “version 1, version 2, version 3″). They can do this in extremely arbitrary ways. The important thing is that you can merge arbitrary orders into less arbitrary structures. There’s some rule for it, akin to the Bayes Rule or Occam’s razor. This is what cognition is, according to my theory.
If this is true, we need to find any domain where concepts and their simplifications are easy enough to formalize. Then we need to figure out a model, figure out the rule of merging simplifications. I’ve got a suggestion and a couple of ideas and many examples.
Context