My bad for testbeds, I didn’t have in mind that you were speaking about this kind of testbeds as opposed to the general E[U|not scheming] analogies (and I forgot you had put them at medium strength, which is sensible for these kinds of testbeds). Same for “the unwarranted focus on claim 3”—it’s mostly because I misunderstood what the countermeasures were trying to address.
I think I don’t have a good understanding of the macrosystem risks you are talking about. I’ll look at that more later.
I think I was a bit unfair about the practicality of techniques that were medium-strength—it’s true that you can get some evidence for safety (maybe 0.3 bits to 1 bit) by using the techniques in a version that is practical.
On practicality and strength, I think there is a rough communication issue here: externalized reasoning is practical, but it’s currently not strong—and it could eventually become strong, but it’s not practical (yet). The same goes for monitoring. But when you write the summary, we see “high practicality and high max strength”, which feels to me like it implies it’s easy to get medium-scalable safety cases that get you acceptable levels of risks by using only one or two good layers of security—which I think is quite wild even if acceptable=[p(doom)<1%]. But I guess you didn’t mean that, and it’s just a weird quirk of the summarization?
which feels to me like it implies it’s easy to get medium-scalable safety cases that get you acceptable levels of risks by using only one or two good layers of security
I agree there’s a communication issue here. Based on what you described, I’m not sure if we disagree.
> (maybe 0.3 bits to 1 bit)
I’m glad we are talking bits. My intuitions here are pretty different. e.g. I think you can get 2-3 bits from testbeds. I’d be keen to discuss standards of evidence etc in person sometime.
My bad for testbeds, I didn’t have in mind that you were speaking about this kind of testbeds as opposed to the general E[U|not scheming] analogies (and I forgot you had put them at medium strength, which is sensible for these kinds of testbeds). Same for “the unwarranted focus on claim 3”—it’s mostly because I misunderstood what the countermeasures were trying to address.
I think I don’t have a good understanding of the macrosystem risks you are talking about. I’ll look at that more later.
I think I was a bit unfair about the practicality of techniques that were medium-strength—it’s true that you can get some evidence for safety (maybe 0.3 bits to 1 bit) by using the techniques in a version that is practical.
On practicality and strength, I think there is a rough communication issue here: externalized reasoning is practical, but it’s currently not strong—and it could eventually become strong, but it’s not practical (yet). The same goes for monitoring. But when you write the summary, we see “high practicality and high max strength”, which feels to me like it implies it’s easy to get medium-scalable safety cases that get you acceptable levels of risks by using only one or two good layers of security—which I think is quite wild even if acceptable=[p(doom)<1%]. But I guess you didn’t mean that, and it’s just a weird quirk of the summarization?
I agree there’s a communication issue here. Based on what you described, I’m not sure if we disagree.
> (maybe 0.3 bits to 1 bit)
I’m glad we are talking bits. My intuitions here are pretty different. e.g. I think you can get 2-3 bits from testbeds. I’d be keen to discuss standards of evidence etc in person sometime.