Interested in AI alignment, thinking about ethics, tap dancing, playing instruments, and wearing sandals year-round.
Harrison G
Thinking About Propensity Evaluations
A Taxonomy Of AI System Evaluations
Super helpful; thanks for writing!
(read: The Athena-Parfit Long-Term Institute for Raising for Effectively Prioritizing Global Alignment Challenges)
I laughed about this for a while. Thank you for this though-provoking post, and for incorporating occasional humor throughout.
At the top right is a pocket constitution made by Legal Impact for Chickens. I received this at an Effective Altruism Global conference, during the career fair. What actually happened was that someone came up to the booth I was at holding the pocket constitution, I noted that it looked cool, and they were kind enough to offer it to me. Unfortunately, I have never knowingly met anybody from Legal Impact for Chickens. I have not actually used this pocket constitution, but I carry it anyway in my winter jacket’s inner breast pocket since (a) it fits very unobtrusively and (b) it seems cool to carry around a pocket constitution.
If this was EAG SF, I remember an experience that sounds very similar to this, and I think I was this person! Ha
″ [...] since every string can be reconstructed by only answering yes or no to questions like ‘is the first bit 1?’ [...]”
Why would humans ever ask this question, and (furthermore) why would we ever ask this question n number of times? It seems unlikely, and easy to prevent. Is there something I’m not understanding about this step?
The quote: “Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).”