Thanks for the link. I don’t actually think that either of the sentences you quoted are closely related to what I’m talking about. You write “[AI] systems will monitor for destructive behavior, and these monitoring systems need to be robust to adversaries”; I think that this is you making a version of the argument that I linked to Adam Gleave and Euan McLean making and that I think is wrong. You wrote “AI tripwires could help uncover early misaligned systems before they can cause damage” in a section on anomaly detection, which isn’t a central case of what I’m talking about.
I’m not sure if I agree as far as the first sentence you mention is concerned. I agree that the rest of the paragraph is talking about something similar to the point that Adam Gleave and Euan McLean make, but the exact sentence is:
Separately, humans and systems will monitor for destructive behavior, and these monitoring systems need to be robust to adversaries.
“Separately” is quite key here.
I assume this is intended to include AI adversaries and high stakes monitoring.
Thanks for the link. I don’t actually think that either of the sentences you quoted are closely related to what I’m talking about.
You write “[AI] systems will monitor for destructive behavior, and these monitoring systems need to be robust to adversaries”; I think that this is you making a version of the argument that I linked to Adam Gleave and Euan McLean making and that I think is wrong.You wrote “AI tripwires could help uncover early misaligned systems before they can cause damage” in a section on anomaly detection, which isn’t a central case of what I’m talking about.I’m not sure if I agree as far as the first sentence you mention is concerned. I agree that the rest of the paragraph is talking about something similar to the point that Adam Gleave and Euan McLean make, but the exact sentence is:
“Separately” is quite key here.
I assume this is intended to include AI adversaries and high stakes monitoring.