[deleted] comments on An overall schema for the friendly AI problems: self-referential convergence criteria

[deleted] 16 Jul 2015 5:11 UTC
3 points

What do I mean by that? Well, imagine you’re trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc… But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.

Wait… what? No.

You don’t solve the value-alignment problem by trying to write down your confusions about the foundations of moral philosophy, because writing down confusion still leaves you fundamentally confused. No amount of intelligence can solve an ill-posed problem in some way other than pointing out that the problem is ill-posed.

You solve it by removing the need to do moral philosophy and instead specifying a computation that corresponds to your moral psychology and its real, actually-existing, specifiable properties.

And then telling metaphysics to take a running jump to boot, and crunching down on Strong Naturalism brand crackers, which come in neat little bullet shapes.
- hairyfigment 19 Jul 2015 17:32 UTC
  0 points
  Parent
  Near as I can tell, you’re proposing some “good meta-ethical rules,” though you may have skipped the difficult parts. And I think the claim, “you stop when your morality is perfectly self-consistent,” was more a factual prediction than an imperative.
  - [deleted] 20 Jul 2015 13:19 UTC
    0 points
    Parent
    I didn’t skip the difficult bits, because I didn’t propose a full solution. I stated an approach to dissolving the problem.
    - hairyfigment 22 Jul 2015 6:00 UTC
      0 points
      Parent
      And do you think that approach differs from the one you quoted?
      - [deleted] 22 Jul 2015 12:43 UTC
        0 points
        Parent
        It involves reasoning about facts rather than metaphysics.
- TheAncientGeek 19 Jul 2015 16:35 UTC
  0 points
  Parent
  And will that model have the right counteractfactuals? Will it evolve under changing conditions the same way that the original would.
  - [deleted] 20 Jul 2015 13:18 UTC
    0 points
    Parent
    If you modelled the real thing correctly, then yes, of course it will.
    - TheAncientGeek 21 Jul 2015 7:42 UTC
      0 points
      Parent
      Yes, of course, but then the questions is: :what is the difference between modelling it correctly and solving moral philosophy? A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.
      - [deleted] 21 Jul 2015 12:37 UTC
        0 points
        Parent
        Well, attempting to account for your grammar and figure out what you meant...
        
        A correct model has to get a bunch of counterfactuals correct, and not just match an empirical dataset.
        
        Yes, and? Causal modelling techniques get counterfactuals right-by-design, in the sense that a correct causal model by definition captures counterfactual behavior, as studied across controlled or intervened experiments.
        
        I mean, I agree that most currently-in-use machine learning techniques don’t bother to capture causal structure, but on the upside, that precise failure to capture and compress causal structure is why those techniques can’t lead to AGI.
        
        what is the difference between modelling it currently, and solving moral philosophy?
        
        I think it’s more accurate to say that we’re trying to dissolve moral philosophy in favor of a scientific model of human evaluative cognition. Surely to a moral philosopher this will sound like a moot distinction, but the precise difference is that the latter thing creates and updates predictive models which capture counterfactual, causal knowledge, and which thus can be elaborated into an explicit theory of morality that doesn’t rely on intuition or situational framing to work.
        TheAncientGeek 21 Jul 2015 13:25 UTC
        0 points
        Parent
        As far as I can tell, human intuition is the territory you would be modelling, here. In particular, when dealing with counterfactuals, since it would be unethical to actually set up trolley problems.
        
        BTW, there is nothing to stop moral philosophy being predictive, etc.
        [deleted] 21 Jul 2015 13:32 UTC
        0 points
        Parent
        
        As far as I can tell, human intuition is the territory you would be modelling, here.
        
        No, we’re trying to capture System 2′s evaluative cognition, not System 1′s fast-and-loose, bias-governed intuitions.
        TheAncientGeek 21 Jul 2015 20:11 UTC
        0 points
        Parent
        Wrong kind of intuition
        
        If you have an extenal standard, as you do with probability theory and logic, system 2 can learn utilitarianism, and its performance can be checked against the external standard.
        
        But we don’t have an agreed standard to compare system 1 ethical reasoning against, because we haven’t solved ,moral philosophy. What we have is system 1 coming up with speculative theories,which have to be checked against intuition, meaning an internal standard
        [deleted] 21 Jul 2015 23:23 UTC
        0 points
        Parent
        Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics. Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.
        TheAncientGeek 22 Jul 2015 7:45 UTC
        −1 points
        Parent
        
        Again, the whole point of this task/project/thing is to come up with an explicit theory to act as an external standard for ethics.
        
        And that is the whole point of moral philosophy..… so it’s sounding like a moot distinction.
        
        Ethical theories are maps of the evaluative-under-full-information-and-individual+social-rationality territory.
        
        You don’t like the word intuition, but the fact remains that while you are building your theory, you will have to check it against humans ability to give answers without knowing how they arrived at them. Otherwise you end up with a clear, consistent theory that nobody finds persuasive.
        Lumifer 21 Jul 2015 23:38 UTC
        −6 points
        Parent
        Such a territory does not exist, therefore it’s not territory.
        [deleted] 22 Jul 2015 2:18 UTC
        1 point
        Parent
        You’re going to have to explain how “thoughts and feelings that people will or would have in certain scenarios” fails to be territory.
        Expand this thread
        Lumifer 22 Jul 2015 2:34 UTC
        −5 points
        Parent
        By not existing.