Yeah that looks good, except that it takes an order of magnitude longer to get going on conceptual alignment directions. I’ll message Adam to hear what happened with that.
For reference there’s this: What I learned running Refine When I talked to Adam about this (over 12 months ago), he didn’t think there was much to say beyond what’s in that post. Perhaps he’s updated since.
My sense is that I view it as more of a success than Adam does. In particular, I think it’s a bit harsh to solely apply the [genuinely new directions discovered] metric. Even when doing everything right, I expect the hit rate to be very low there, with [variation on current framing/approach] being the most common type of success.
Agreed that Refine’s timescale is clearly too short. However, a much longer program would set a high bar for whoever’s running it. Personally, I’d only be comfortable doing so if the setup were flexible enough that it didn’t seem likely to limit the potential of participants (by being less productive-in-the-sense-desired than counterfactual environments).
In particular, I think it’s a bit harsh to solely apply the [genuinely new directions discovered] metric. Even when doing everything right, I expect the hit rate to be very low there, with [variation on current framing/approach] being the most common type of success.
Mhm. In fact I’d want to apply a bar that’s even lower, or at least different: [the extent to which the participants (as judged by more established alignment thinkers) seem to be well on the way to developing new promising directions—e.g. being relentlessly resourceful including at the meta-level; having both appropriate Babble and appropriate Prune; not shying away from the hard parts].
the setup were flexible enough that it didn’t seem likely to limit the potential of participants (by being less productive-in-the-sense-desired than counterfactual environments).
Agree that this is an issue, but I think it can be addressed—certainly at least well enough that there’d be worthwhile value-of-info in running such a thing.
I’d be happy to contribute a bit of effort, if someone else is taking the lead. I think most of my efforts will be directed elsewhere, but for example I’d be happy to think through what such a program should look like; help write justificatory parts of grant applications; and maybe mentor / similar.
As a concrete proposal, if anyone wants to reboot Refine or similar, I’d be interested to consider that while wearing my Manifund Regrantor hat.
Yeah that looks good, except that it takes an order of magnitude longer to get going on conceptual alignment directions. I’ll message Adam to hear what happened with that.
For reference there’s this: What I learned running Refine
When I talked to Adam about this (over 12 months ago), he didn’t think there was much to say beyond what’s in that post. Perhaps he’s updated since.
My sense is that I view it as more of a success than Adam does. In particular, I think it’s a bit harsh to solely apply the [genuinely new directions discovered] metric. Even when doing everything right, I expect the hit rate to be very low there, with [variation on current framing/approach] being the most common type of success.
Agreed that Refine’s timescale is clearly too short.
However, a much longer program would set a high bar for whoever’s running it.
Personally, I’d only be comfortable doing so if the setup were flexible enough that it didn’t seem likely to limit the potential of participants (by being less productive-in-the-sense-desired than counterfactual environments).
Ah thanks!
Mhm. In fact I’d want to apply a bar that’s even lower, or at least different: [the extent to which the participants (as judged by more established alignment thinkers) seem to be well on the way to developing new promising directions—e.g. being relentlessly resourceful including at the meta-level; having both appropriate Babble and appropriate Prune; not shying away from the hard parts].
Agree that this is an issue, but I think it can be addressed—certainly at least well enough that there’d be worthwhile value-of-info in running such a thing.
I’d be happy to contribute a bit of effort, if someone else is taking the lead. I think most of my efforts will be directed elsewhere, but for example I’d be happy to think through what such a program should look like; help write justificatory parts of grant applications; and maybe mentor / similar.
Report back if you get details, I’m curious.
See Joe’s sibling comment
https://www.lesswrong.com/posts/QzQQvGJYDeaDE4Cfg/talent-needs-in-technical-ai-safety#JP5LA9cNgqxgdAz8Z
I have, and I also remember seeing Adam’s original retrospective, but I always found it unsatisfying. Thanks anyway!