A quick comment after skimming: IBP might be relevant here, because it formalizes computationalism and provides a natural “objective” domain of truth (namely Γ×2Γ i.e. which computations are “running” and what values do they take).
Any more detailed thoughts on its relevance? EG, a semi-concrete ELK proposal based on this notion of truth/computationalism? Can identifying-running-computations stand in for direct translation?
The main difficulty is that you still need to translate between the formal language of computations and something humans can understand in practice (which probably means natural language). This is similar to Dialogic RL. So you still need an additional subsystem for making this translation, e.g. AQD. At which point you can ask, why not just apply AQD directly to a pivotal[1] action?
I’m not sure what the answer is. Maybe we should apply AQD directly, or maybe AQD is too weak for pivotal actions but good enough for translation. Or maybe it’s not even good enough for translation, in which case it’s back to the blueprints. (Similar considerations apply to other options like IDA.)
A quick comment after skimming: IBP might be relevant here, because it formalizes computationalism and provides a natural “objective” domain of truth (namely Γ×2Γ i.e. which computations are “running” and what values do they take).
Any more detailed thoughts on its relevance? EG, a semi-concrete ELK proposal based on this notion of truth/computationalism? Can identifying-running-computations stand in for direct translation?
The main difficulty is that you still need to translate between the formal language of computations and something humans can understand in practice (which probably means natural language). This is similar to Dialogic RL. So you still need an additional subsystem for making this translation, e.g. AQD. At which point you can ask, why not just apply AQD directly to a pivotal[1] action?
I’m not sure what the answer is. Maybe we should apply AQD directly, or maybe AQD is too weak for pivotal actions but good enough for translation. Or maybe it’s not even good enough for translation, in which case it’s back to the blueprints. (Similar considerations apply to other options like IDA.)
More precisely, to something “quasipivotal” like “either give me a pivotal action or some improvement on the alignment protocol I’m using right now”.