MiguelDev comments on Apply to >30 AI safety funders in one application with the Nonlinear Network

MiguelDev 13 Apr 2023 2:55 UTC
12 points
0
Hello there,

Are you interested of funding this theory of mine that I submitted to AI alignment awards? I am able to make this work in GPT2 and now writing the results. I was able to make GPT2 shutdown itself (100% of the time) even if it’s aware of the shutdown instruction called “the Gauntlet” embedded through fine-tuning an artificially generated archetype called “the Guardian” essentially solving corrigibility, outer and inner alignment.

https://twitter.com/whitehatStoic/status/1646429585133776898?t=WymUs_YmEH8h_HC1yqc_jw&s=19

Let me know if you guys are interested. I want to test it in higher parameter models like Llama and Alpaca but don’t have the means to finance the equipment.

I also found out that there is a weird setting in the temperature for GPT2 where in the range of .498 to .50 my shutdown code works really well, I still don’t know why though. But yeah I believe that there is an incentive to review what’s happening inside the transformer architecture.

Here was my original proposal: https://www.whitehatstoic.com/p/research-proposal-leveraging-jungian

I’ll post my paper for the corrigibility solution too once finished probably next week but if you wish to contact me, just reply here or email me at migueldeguzmandev@gmail.com.

If you want to see my meeting schedule, You can find it here: https://calendly.com/migueldeguzmandev/60min

Looking forward to hearing from you.

Best regards,

Miguel

Update: Already sent an application, I didn’t saw that in my first read. Thank you.