Apparently you approve of some processes that would change your moral beliefs and disapprove of others,
Well, yes. For example, learning a new fact is approved. Administering to me a drug is unapproved. Would you disagree with these moral judgments?
you aren’t willing to describe your approval in terms of how close your beliefs would get to some ideal counterfactual such as “having heard and understood all relevant arguments”
Oh, I’d be perfectly willing to describe it in those terms, if I thought I could get away with it. But you can’t get away with that in FAI work.
Words like “relevant” assume precisely that distinction between approved and unapproved.
Humans don’t start out all that tremendously coherent, so the “ideal counterfactual” cannot just be assumed into existence—it’s at least possible that different orders in which we “hear and understand” things would send us into distinct attractors.
You would have to construe some specific counterfactual, and that choice itself would be morally challengeable; it would be a guess, part of your Q_P. It’s not like you can call upon an ideal to write code; let alone, write the code that defines itself.
For EV_Q_P to be defined coherently, it has to be bootstrapped out of Q_P with a well-defined order of operations in which no function is called before it has been defined. You can’t say that EV_Q_P is whatever EV_Q_P says it should be. That either doesn’t halt, or outputs anything.
When you use a word like ideal in “ideal counterfactual”, how to construe that counterfactual is itself a moral judgment. If that counterfactual happens to define “idealness”, you need some non-ideal definition of it to start with, or the recursion has no foundation.
Well, yes. For example, learning a new fact is approved. Administering to me a drug is unapproved. Would you disagree with these moral judgments?
Oh, I’d be perfectly willing to describe it in those terms, if I thought I could get away with it. But you can’t get away with that in FAI work.
Words like “relevant” assume precisely that distinction between approved and unapproved.
Humans don’t start out all that tremendously coherent, so the “ideal counterfactual” cannot just be assumed into existence—it’s at least possible that different orders in which we “hear and understand” things would send us into distinct attractors.
You would have to construe some specific counterfactual, and that choice itself would be morally challengeable; it would be a guess, part of your Q_P. It’s not like you can call upon an ideal to write code; let alone, write the code that defines itself.
For EV_Q_P to be defined coherently, it has to be bootstrapped out of Q_P with a well-defined order of operations in which no function is called before it has been defined. You can’t say that EV_Q_P is whatever EV_Q_P says it should be. That either doesn’t halt, or outputs anything.
When you use a word like ideal in “ideal counterfactual”, how to construe that counterfactual is itself a moral judgment. If that counterfactual happens to define “idealness”, you need some non-ideal definition of it to start with, or the recursion has no foundation.