For areas where we don’t have empirical feedback-loops (like many philosophical topics), I imagine that the “baseline solution” for getting help from AIs is to teach them to imitate our reasoning. Either just by literally writing the words that it predicts that we would write (but faster), or by having it generate arguments that we would think looks good. (Potentially recursively, c.f. amplification, debate, etc.)
This seems like the default road that we’re walking down, but can ML learn everything that is important to learn? I questioned this in Some Thoughts on Metaphilosophy
One of the few plausible-seeming ways to outperform that baseline is to identify epistemic practices that work well on questions where we do have empirical feedback loops, and then transferring those practices to questions where we lack such feedback loops.
It seems plausible to me that this could help, but also plausible that philosophical reasoning (at least partly) involves cognition that’s distinct from empirical reasoning, so that such techniques can’t capture everything that is important to capture. (Some Thoughts on Metaphilosophy contains some conjectures relevant to this, so please read that if you haven’t already.)
BTW, it looks like you’re working/thinking in this direction, which I appreciate, but doesn’t it seem to you that the topic is super neglected (even compared to AI alignment) given that the risks/consequences of failing to correctly solve this problem seem comparable to the risk of AI takeover?(See for example Paul Christiano’s probabilities.) I find it frustrating/depressing how few people even mention it in passing as an important problem to be solved as they talk about AI-related risks (Paul being a rare exception to this).
doesn’t it seem to you that the topic is super neglected (even compared to AI alignment) given that the risks/consequences of failing to correctly solve this problem seem comparable to the risk of AI takeover?
Yes, I’m sympathetic. Among all the issues that will come with AI, I think alignment is relatively tractable (at least it is now) and that it has an unusually clear story for why we shouldn’t count on being able to defer it to smarter AIs (though that might work). So I think it’s probably correct for it to get relatively more attention. But even taking that into account, the non-alignment singularity issues do seem too neglected.
I’m currently trying to figure out what non-alignment stuff seems high-priority and whether I should be tackling any of it.
This seems like the default road that we’re walking down, but can ML learn everything that is important to learn? I questioned this in Some Thoughts on Metaphilosophy
It seems plausible to me that this could help, but also plausible that philosophical reasoning (at least partly) involves cognition that’s distinct from empirical reasoning, so that such techniques can’t capture everything that is important to capture. (Some Thoughts on Metaphilosophy contains some conjectures relevant to this, so please read that if you haven’t already.)
BTW, it looks like you’re working/thinking in this direction, which I appreciate, but doesn’t it seem to you that the topic is super neglected (even compared to AI alignment) given that the risks/consequences of failing to correctly solve this problem seem comparable to the risk of AI takeover?(See for example Paul Christiano’s probabilities.) I find it frustrating/depressing how few people even mention it in passing as an important problem to be solved as they talk about AI-related risks (Paul being a rare exception to this).
Yes, I’m sympathetic. Among all the issues that will come with AI, I think alignment is relatively tractable (at least it is now) and that it has an unusually clear story for why we shouldn’t count on being able to defer it to smarter AIs (though that might work). So I think it’s probably correct for it to get relatively more attention. But even taking that into account, the non-alignment singularity issues do seem too neglected.
I’m currently trying to figure out what non-alignment stuff seems high-priority and whether I should be tackling any of it.