Such things as Impact Measures still require “system level” thinking.
Recognizing/learning properties of pathological systems may be easier than perfectly learning human values (without learning to be a deceptive manipulator).
I don’t think that “act level reasoning” and “system level reasoning” is a meaningful distinction. I think it’s the same thing. Humans need to do it anyway. And AI would need to do it anyway. I just suggested making such reasoning fundamental.
The example of deconstructing and constructing, building deep in a city has the characteristics of taking down an old building and building up a newer one in its valuable land area. The judgement of categorising demolising as a method of construction as part of a pathological system would then seem to be a false negative. If we can make good quality judgements and bad quality judgements like these, what basis do we have to think that the judgement on the system is leading as forward rather than leading us astray?
Different tasks may assume different types of “systems”. You can specify the type of task you’re asking or teach the AI to determine it/ask human if there’s an ambiguity.
“Turning a worse thing into a better thing” is generally a way better idea than “breaking and fixing a thing without making it better”. It’s true for a lot of tasks, both instrumental and terminal.
The point about desserted island is that “money systems” have an area of applicability and there are things outside of that.
“Money systems” is just a metaphor. And this metaphor is still applicable here. I mean, I used exactly the same example in the post: “if you’re sealed in a basement with a lot of money they’re not worth anything”.
What general conclusions about my idea do you want to reach? I think it’s important for the arguments. For example, if you want to say that my idea may have problems, then of course I agree. If you want to say that my idea is worse than all other ideas and shouldn’t be considered, then I disagree.
I see a constellation of musings which seem somewhat promising but I can’t really comprehend it as an idea. I can not state in my own words what you mean.
I thought that system level reasoning vs some old way of doing things was pretty important and now it seems its only a minor detail.
It would seem that “traditionally” we have “moral systems”, “law systems” or “strategy systems” and then we improve on this by using “money systems”. But these words are used in an abstracted sense or have additional meanings that those words do not usually have so it becomes extremely hard to pinpoint what is meant.
It would seem that “traditionally” we have “moral systems”, “law systems” or “strategy systems” and then we improve on this by using “money systems”. But these words are used in an abstracted sense or have additional meanings that those words do not usually have so it becomes extremely hard to pinpoint what is meant.
I tried to formulate my idea “in a few words” in this part of the post: Alignment. Recap
You can split possible effects of AI’s actions into three domains. All of them are different (with different ideas), even though they partially intersect and can be formulated in terms of each other. Traditionally we focus on the first two domains:
(Not) accomplishing a goal. Utility functions are about this.
(Not) violating human values. Models of human feedback are about this.
(Not) modifying a system without breaking it. Impact measures are about this.
My idea is about combining all of this (mostly 2 and 3) into a single approach. Or generalizing ideas for the third domain. There isn’t a lot of ideas for the third one, as far as I know. Maybe people are not aware enough about that domain.
I know that it’s confusing, I struggled to formulate the difference myself. But if you realize the difference between the 3 domains everything should become clear. “Human values vs. laws of a society” may be a good analogy for the difference between 2 and 3: those two things are not equivalent even though they intersect and can be formulated in terms of each other.
I thought that system level reasoning vs some old way of doing things was pretty important and now it seems its only a minor detail.
I believe there’s a difference, but the difference isn’t about complexity. Complexity of reasoning doesn’t depend on your goals or “code of conduct”.
My points about complexity still stand:
Such things as Impact Measures still require “system level” thinking.
Recognizing/learning properties of pathological systems may be easier than perfectly learning human values (without learning to be a deceptive manipulator).
I don’t think that “act level reasoning” and “system level reasoning” is a meaningful distinction. I think it’s the same thing. Humans need to do it anyway. And AI would need to do it anyway. I just suggested making such reasoning fundamental.
Different tasks may assume different types of “systems”. You can specify the type of task you’re asking or teach the AI to determine it/ask human if there’s an ambiguity.
“Turning a worse thing into a better thing” is generally a way better idea than “breaking and fixing a thing without making it better”. It’s true for a lot of tasks, both instrumental and terminal.
“Money systems” is just a metaphor. And this metaphor is still applicable here. I mean, I used exactly the same example in the post: “if you’re sealed in a basement with a lot of money they’re not worth anything”.
What general conclusions about my idea do you want to reach? I think it’s important for the arguments. For example, if you want to say that my idea may have problems, then of course I agree. If you want to say that my idea is worse than all other ideas and shouldn’t be considered, then I disagree.
I see a constellation of musings which seem somewhat promising but I can’t really comprehend it as an idea. I can not state in my own words what you mean.
I thought that system level reasoning vs some old way of doing things was pretty important and now it seems its only a minor detail.
It would seem that “traditionally” we have “moral systems”, “law systems” or “strategy systems” and then we improve on this by using “money systems”. But these words are used in an abstracted sense or have additional meanings that those words do not usually have so it becomes extremely hard to pinpoint what is meant.
I tried to formulate my idea “in a few words” in this part of the post: Alignment. Recap
You can split possible effects of AI’s actions into three domains. All of them are different (with different ideas), even though they partially intersect and can be formulated in terms of each other. Traditionally we focus on the first two domains:
(Not) accomplishing a goal. Utility functions are about this.
(Not) violating human values. Models of human feedback are about this.
(Not) modifying a system without breaking it. Impact measures are about this.
My idea is about combining all of this (mostly 2 and 3) into a single approach. Or generalizing ideas for the third domain. There isn’t a lot of ideas for the third one, as far as I know. Maybe people are not aware enough about that domain.
I know that it’s confusing, I struggled to formulate the difference myself. But if you realize the difference between the 3 domains everything should become clear. “Human values vs. laws of a society” may be a good analogy for the difference between 2 and 3: those two things are not equivalent even though they intersect and can be formulated in terms of each other.
I believe there’s a difference, but the difference isn’t about complexity. Complexity of reasoning doesn’t depend on your goals or “code of conduct”.