I missed this until I finally got around to responding to your last post, which I’d put on my todo list.
I applaud your initiative and drive! I do think it’s a tough pitch to try to leapfrog the fast progressive in deep networks. Nor do I think the alignment picture for those types of systems is nearly as bleak as Yudkowsky & the old school believe. But neither is it likely to be easy enough to leave to chance and those who don’t fully grasp the seriousness of the problem. I’ve written about some of the most common Cruxes of disagreement on alignment difficulty.
OTOH, if you have a really insightful approach, and a good reason to think the result would be easier to align than language model agents, maybe pursuing that path makes sense, since no one else is doing exactly that. As I said in my comment on your last request for directions, I think there are higher-expected-value nearly-as-underserved routes to survival; namely, working on alignment for the LLM agents that are our most likely route to first AGIs at this point (focusing on different routes from aligning the base LLMs, which is common but inadequate).
I’m also happy to talk. Your devotion to the project is impressive, and a resource not to be wasted!
Thank you for your kind comment! I disagree with the johnswentworth post you linked; it’s misleading to frame NN interpretability as though we started out having any graph with any labels, weird-looking labels or not. I have sent you a DM.
I missed this until I finally got around to responding to your last post, which I’d put on my todo list.
I applaud your initiative and drive! I do think it’s a tough pitch to try to leapfrog the fast progressive in deep networks. Nor do I think the alignment picture for those types of systems is nearly as bleak as Yudkowsky & the old school believe. But neither is it likely to be easy enough to leave to chance and those who don’t fully grasp the seriousness of the problem. I’ve written about some of the most common Cruxes of disagreement on alignment difficulty.
So I’d suggest you would have better odds working within the ML framework that is happening with or without your help. I also think that even if you do produce a near-miraculous breakthrough in symbolic GOFAI, Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc.
OTOH, if you have a really insightful approach, and a good reason to think the result would be easier to align than language model agents, maybe pursuing that path makes sense, since no one else is doing exactly that. As I said in my comment on your last request for directions, I think there are higher-expected-value nearly-as-underserved routes to survival; namely, working on alignment for the LLM agents that are our most likely route to first AGIs at this point (focusing on different routes from aligning the base LLMs, which is common but inadequate).
I’m also happy to talk. Your devotion to the project is impressive, and a resource not to be wasted!
Thank you for your kind comment! I disagree with the johnswentworth post you linked; it’s misleading to frame NN interpretability as though we started out having any graph with any labels, weird-looking labels or not. I have sent you a DM.