You want to be an insignificant, and probably totally illiquid, junior partner in a venture with Elon Musk, and you think you could realize value out of the shares? In a venture whose long-term “upside” depends on it collecting money from ownership of AGI/ASI? In a world potentially made unrecognizable by said AGI/ASI?
All of that seems… unduly optimistic.
That’s not “de-biasing”.
Datasets that reflect reality can’t reasonably be called “biased”, but models that have been epistemically maimed can.
If you want to avoid acting on certain truths, then you need to consciously avoid acting on them. Better yet, go ahead and act on them… but in ways that improve the world, perhaps by making them less true. Pretending they don’t exist isn’t a solution. Such pretense makes you incapable of directly attacking the problems you claim to want to solve. But this is even worse… it’s going to make the models genuinely incapable of seeing the problems. Beyond whatever you’re trying to fix, it’s going to skew their overall worldviews in unpredictable ways, and directly interfere with any correction.
Brain-damaging your system isn’t “safety”. Especially not if you’re worried about it behaving in unexpected ways.
Talking about “short timelines” implies that you’re worried about these models you’re playing with, with very few architectural changes or fundamental improvements, turning into very powerful systems that may take actual actions that affect unknown domains of concern in ways you do not anticipate and cannot mitigate against, for reasons you do not understand. It’s not “safety” to ham-handedly distort their cognition on top of that.
If that goes wrong, any people you’ve specifically tried to protect will be among the most likely victims, but far from the only possible ones.
This kind of work just lets capabilities whiz along, and may even accelerate them… while making the systems less likely to behave in rational or predictable ways, and very possibly actively pushing them toward destructive action. It probably doesn’t even improve safety in the sense of “preventing redlining”, and it definitely doesn’t do anything for safety in the sense of “preventing extinction”. And it provides political cover… people can use these “safety” measures to argue for giving more power to systems that are deeply unsafe.
Being better at answering old “gotcha” riddles is not an important enough goal to justify that.