What do I mean by that? Well, imagine you’re trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc… But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.
Wait… what? No.
You don’t solve the value-alignment problem by trying to write down your confusions about the foundations of moral philosophy, because writing down confusion still leaves you fundamentally confused. No amount of intelligence can solve an ill-posed problem in some way other than pointing out that the problem is ill-posed.
You solve it by removing the need to do moral philosophy and instead specifying a computation that corresponds to your moral psychology and its real, actually-existing, specifiable properties.
And then telling metaphysics to take a running jump to boot, and crunching down on Strong Naturalism brand crackers, which come in neat little bullet shapes.
Near as I can tell, you’re proposing some “good meta-ethical rules,” though you may have skipped the difficult parts. And I think the claim, “you stop when your morality is perfectly self-consistent,” was more a factual prediction than an imperative.
Yes, of course, but then the questions is: :what is the difference between modelling it correctly and solving moral philosophy? A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.
Well, attempting to account for your grammar and figure out what you meant...
A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.
Yes, and? Causal modelling techniques get counterfactuals right-by-design, in the sense that a correct causal model by definition captures counterfactual behavior, as studied across controlled or intervened experiments.
I mean, I agree that most currently-in-use machine learning techniques don’t bother to capture causal structure, but on the upside, that precise failure to capture and compress causal structure is why those techniques can’t lead to AGI.
what is the difference between modelling it currently, and solving moral philosophy?
I think it’s more accurate to say that we’re trying to dissolve moral philosophy in favor of a scientific model of human evaluative cognition. Surely to a moral philosopher this will sound like a moot distinction, but the precise difference is that the latter thing creates and updates predictive models which capture counterfactual, causal knowledge, and which thus can be elaborated into an explicit theory of morality that doesn’t rely on intuition or situational framing to work.
As far as I can tell, human intuition is the territory you would be modelling, here. In particular, when dealing with counterfactuals, since it would be unethical to actually set up trolley problems.
BTW, there is nothing to stop moral philosophy being predictive, etc.
If you have an extenal standard, as you do with probability theory and logic, system 2 can learn utilitarianism, and its performance can be checked against the external standard.
But we don’t have an agreed standard to compare system 1 ethical reasoning against, because we haven’t solved ,moral philosophy. What we have is system 1 coming up with speculative theories,which have to be checked against intuition, meaning an internal standard
Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.
Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics.
And that is the whole point of moral philosophy..… so it’s sounding like a moot distinction.
Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.
You don’t like the word intuition, but the fact remains that while you are building your theory, you will have to check it against humans ability to give answers without knowing how they arrived at them. Otherwise you end up with a clear, consistent theory that nobody finds persuasive.
Wait… what? No.
You don’t solve the value-alignment problem by trying to write down your confusions about the foundations of moral philosophy, because writing down confusion still leaves you fundamentally confused. No amount of intelligence can solve an ill-posed problem in some way other than pointing out that the problem is ill-posed.
You solve it by removing the need to do moral philosophy and instead specifying a computation that corresponds to your moral psychology and its real, actually-existing, specifiable properties.
And then telling metaphysics to take a running jump to boot, and crunching down on Strong Naturalism brand crackers, which come in neat little bullet shapes.
Near as I can tell, you’re proposing some “good meta-ethical rules,” though you may have skipped the difficult parts. And I think the claim, “you stop when your morality is perfectly self-consistent,” was more a factual prediction than an imperative.
I didn’t skip the difficult bits, because I didn’t propose a full solution. I stated an approach to dissolving the problem.
And do you think that approach differs from the one you quoted?
It involves reasoning about facts rather than metaphysics.
And will that model have the right counteractfactuals? Will it evolve under changing conditions the same way that the original would.
If you modelled the real thing correctly, then yes, of course it will.
Yes, of course, but then the questions is: :what is the difference between modelling it correctly and solving moral philosophy? A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.
Well, attempting to account for your grammar and figure out what you meant...
Yes, and? Causal modelling techniques get counterfactuals right-by-design, in the sense that a correct causal model by definition captures counterfactual behavior, as studied across controlled or intervened experiments.
I mean, I agree that most currently-in-use machine learning techniques don’t bother to capture causal structure, but on the upside, that precise failure to capture and compress causal structure is why those techniques can’t lead to AGI.
I think it’s more accurate to say that we’re trying to dissolve moral philosophy in favor of a scientific model of human evaluative cognition. Surely to a moral philosopher this will sound like a moot distinction, but the precise difference is that the latter thing creates and updates predictive models which capture counterfactual, causal knowledge, and which thus can be elaborated into an explicit theory of morality that doesn’t rely on intuition or situational framing to work.
As far as I can tell, human intuition is the territory you would be modelling, here. In particular, when dealing with counterfactuals, since it would be unethical to actually set up trolley problems.
BTW, there is nothing to stop moral philosophy being predictive, etc.
No, we’re trying to capture System 2′s evaluative cognition, not System 1′s fast-and-loose, bias-governed intuitions.
Wrong kind of intuition
If you have an extenal standard, as you do with probability theory and logic, system 2 can learn utilitarianism, and its performance can be checked against the external standard.
But we don’t have an agreed standard to compare system 1 ethical reasoning against, because we haven’t solved ,moral philosophy. What we have is system 1 coming up with speculative theories,which have to be checked against intuition, meaning an internal standard
Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.
And that is the whole point of moral philosophy..… so it’s sounding like a moot distinction.
You don’t like the word intuition, but the fact remains that while you are building your theory, you will have to check it against humans ability to give answers without knowing how they arrived at them. Otherwise you end up with a clear, consistent theory that nobody finds persuasive.
Such a territory does not exist, therefore it’s not territory.
You’re going to have to explain how “thoughts and feelings that people will or would have in certain scenarios” fails to be territory.
By not existing.