This seems like very much the wrong type of reasoning to do about novel scientific research. Big breakthroughs open up possibilities that are very hard to imagine before those breakthroughs.
What type of reasoning do you think would be most appropriate?
This proves too much. The only way to determine whether a research direction is promising or not is through object-level arguments. I don’t see how we can proceed without scrutinizing the agendas and listing the main difficulties.
this by itself is sufficient to recommend it.
I don’t think it’s that simple. We have to weigh the good against the bad, and I’d like to see some object-level explanations for why the bad doesn’t outweigh the good, and why the problem is sufficiently tractable.
Interpretability seems like our clear best bet for developing a more principled understanding of how deep learning works;
Maybe. I would still argue that other research avenues are neglected in the community.
not focused enough on forward-chaining to find the avenues of investigation which actually allow useful feedback and give us a hook into understanding the world better
I provided plenty of technical research direction in the “preventive measures” section, this should also qualifies as forward-chaining. And interp is certainly not the only way to understand the world better. Besides, I didn’t say we should stop Interp research altogether, just consider other avenues.
More generally, I’m strongly against arguments of the form “we shouldn’t do useful work, because then it will encourage other people to do bad things”. In theory arguments like these can sometimes be correct, but in practice perfect is often the enemy of the good.
I think I agree, but this is only one of the many points in my post.
What type of reasoning do you think would be most appropriate?
See the discussion between me and interstice upthread for a type of argument that feels more productive.
I would still argue that other research avenues are neglected in the community.
I agree (and mentioned so in my original comment). This post would have been far more productive if it had focused on exploring them.
We have to weigh the good against the bad, and I’d like to see some object-level explanations for why the bad doesn’t outweigh the good, and why the problem is sufficiently tractable.
The things you should be looking for, when it comes to fundamental breakthroughs, are deep problems demonstrating fascinating phenomena, and especially cases where you can get rapid feedback from reality. That’s what we’ve got here. If that’s not object-level enough then your criterion would have ruled out almost all great science in the past.
I think I agree, but this is only one of the many points in my post.
I wouldn’t have criticized it so strongly if you hadn’t listed it as “Perhaps the main problem I have with interp”.
This post would have been far more productive if it had focused on exploring them.
So the sections “Counteracting deception with only interp is not the only approach” and “Preventive measures against deception”, “Cognitive Emulations” and “Technical Agendas with better ToI” don’t feel productive? It seems to me that it’s already a good list of neglected research agendas. So I don’t understand.
if you hadn’t listed it as “Perhaps the main problem I have with interp”
In the above comment, I only agree with “we shouldn’t do useful work, because then it will encourage other people to do bad things”, and I don’t agree with your critique of “Perhaps the main problem I have with interp...” which I think is not justified enough.
So the sections “Counteracting deception with only interp is not the only approach” and “Preventive measures against deception”, “Cognitive Emulations” and “Technical Agendas with better ToI” don’t feel productive? It seems to me that it’s already a good list of neglected research agendas. So I don’t understand.
You’ve listed them, but you haven’t really argued that they’re valuable, you’re mostly just asserting stuff like Rob Miles having a bigger impact than most interpretability researchers, or the best strategy being copying Dan Hendrycks. But since I disagree with the assertions, these sections aren’t very useful; they don’t actually zoom in on the positive case for these research directions.
(The main positive case I’m seeing seems to be “anything which helps with coordination is really valuable”. And sure, coordination is great. But most coordination-related research is shallow: it helps us do things now, but doesn’t help us figure out how to do things better in the long term. So I think you’re overstating the case for it in general.)
I agree that I haven’t argued the positive case for more governance/coordination work (and that’s why I hope to do a next post on that).
We do need alignment work, but I think the current allocation is too focused on alignment, whereas AI X-Risks could arrive in the near future. I’ll be happy to reinvest in alignment work once we’re sure we can avoid X-Risks from misuses and grossly negligent accidents.
What type of reasoning do you think would be most appropriate?
This proves too much. The only way to determine whether a research direction is promising or not is through object-level arguments. I don’t see how we can proceed without scrutinizing the agendas and listing the main difficulties.
I don’t think it’s that simple. We have to weigh the good against the bad, and I’d like to see some object-level explanations for why the bad doesn’t outweigh the good, and why the problem is sufficiently tractable.
Maybe. I would still argue that other research avenues are neglected in the community.
I provided plenty of technical research direction in the “preventive measures” section, this should also qualifies as forward-chaining. And interp is certainly not the only way to understand the world better. Besides, I didn’t say we should stop Interp research altogether, just consider other avenues.
I think I agree, but this is only one of the many points in my post.
See the discussion between me and interstice upthread for a type of argument that feels more productive.
I agree (and mentioned so in my original comment). This post would have been far more productive if it had focused on exploring them.
The things you should be looking for, when it comes to fundamental breakthroughs, are deep problems demonstrating fascinating phenomena, and especially cases where you can get rapid feedback from reality. That’s what we’ve got here. If that’s not object-level enough then your criterion would have ruled out almost all great science in the past.
I wouldn’t have criticized it so strongly if you hadn’t listed it as “Perhaps the main problem I have with interp”.
So the sections “Counteracting deception with only interp is not the only approach” and “Preventive measures against deception”, “Cognitive Emulations” and “Technical Agendas with better ToI” don’t feel productive? It seems to me that it’s already a good list of neglected research agendas. So I don’t understand.
In the above comment, I only agree with “we shouldn’t do useful work, because then it will encourage other people to do bad things”, and I don’t agree with your critique of “Perhaps the main problem I have with interp...” which I think is not justified enough.
You’ve listed them, but you haven’t really argued that they’re valuable, you’re mostly just asserting stuff like Rob Miles having a bigger impact than most interpretability researchers, or the best strategy being copying Dan Hendrycks. But since I disagree with the assertions, these sections aren’t very useful; they don’t actually zoom in on the positive case for these research directions.
(The main positive case I’m seeing seems to be “anything which helps with coordination is really valuable”. And sure, coordination is great. But most coordination-related research is shallow: it helps us do things now, but doesn’t help us figure out how to do things better in the long term. So I think you’re overstating the case for it in general.)
I agree that I haven’t argued the positive case for more governance/coordination work (and that’s why I hope to do a next post on that).
We do need alignment work, but I think the current allocation is too focused on alignment, whereas AI X-Risks could arrive in the near future. I’ll be happy to reinvest in alignment work once we’re sure we can avoid X-Risks from misuses and grossly negligent accidents.