Wei Dai comments on Contest: $1,000 for good questions to ask to an Oracle AI

Wei Dai 10 Aug 2019 18:58 UTC
LW: 2 AF: 1
AF
Submission. “Hacking/phishing assistant.” For the counterfactual Oracle, ask the Oracle to predict what would happen if one were to send a message/data/command to some hacking/phishing (human or machine) target. In the event of erasure, actually send that message to the target and use the actual response to train the Oracle. Note this is safer than using RL to automate hacking/phishing because humans are coming up with candidate messages to send (so they’ll avoid messages that could cause bad side-effects such as psychological damage to the recipient, or creation of self-replicating code), but potentially more capable than using humans or human imitators to do hacking/phishing because the Oracle can model the target better than humans can. (ETA: This idea could be combined with a human imitator to make the system faster / more capable.)
What links here?
- Results of $1,000 Oracle contest! by Stuart_Armstrong (17 Jun 2020 17:44 UTC; 60 points)