This is very interesting, and I had a recent thought that’s very similar:
This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?
The way I imagine this to work is basically that an AI agent would develop really strong intuitions that “that’s just what ASIs do”. It might prevent it from properly modelling other agents that aren’t trained on this, but it’s not obvious to me that that’s going to happen or that it’s such a decisively bad thing to outweigh the positives
I imagine that the ratio of descriptions of desirable vs. descriptions of undesirable behavior would matter, and perhaps an ideal approach would both (massively) increase the amount of descriptions of desirable behavior as well as filter out the descriptions of unwanted behavior?
Reiterating my intention to just do this (data seeding) and my call for critiques before I proceed:
I have had this idea for a while. Seems like a good thing to do (...) If nobody convinces me it’s a bad idea in a week’s time from posting, I’ll just proceed to implementation.
This is very interesting, and I had a recent thought that’s very similar:
I imagine that the ratio of descriptions of desirable vs. descriptions of undesirable behavior would matter, and perhaps an ideal approach would both (massively) increase the amount of descriptions of desirable behavior as well as filter out the descriptions of unwanted behavior?
Reiterating my intention to just do this (data seeding) and my call for critiques before I proceed: