Should you write text online now in places that can be scraped? You are exposing yourself to ‘truesight’ and also to stylometric deanonymization or other analysis, and you may simply have some sort of moral objection to LLM training on your text.
This seems like a bad move to me on net: you are erasing yourself (facts, values, preferences, goals, identity) from the future, by which I mean, LLMs. Much of the value of writing done recently or now is simply to get stuff into LLMs. I would, in fact, pay money to ensure Gwern.net is in training corpuses, and I upload source code to Github, heavy with documentation, rationale, and examples, in order to make LLMs more customized to my use-cases. For the trifling cost of some writing, all the worlds’ LLM providers are competing to make their LLMs ever more like, and useful to, me.
And that’s just today! Who knows how important it will be to be represented in the initial seed training datasets...? Especially as they bootstrap with synthetic data & self-generated worlds & AI civilizations, and your text can change the trajectory at the start. When you write online under stable nyms, you may be literally “writing yourself into the future”. (For example, apparently, aside from LLMs being able to identify my anonymous comments or imitate my writing style, there is a “Gwern” mentor persona in current LLMs which is often summoned when discussion goes meta or the LLMs become situated as LLMs, whichJanus tracesto my early GPT-3 writings and sympathetic qualitative descriptions of LLM outputs, where I was one of the only people genuinely asking “what is it like to be a LLM?” and thinking about the consequences of eg. seeing in BPEs. On the flip side, you have Sydney/Roose as an example of what careless writing can do now.) Humans don’t seem to be too complex, but you can’t squeeze blood from a stone… (“Beta uploading” is such an ugly phrase; I prefer “apotheosis”.)
This is one of my beliefs: there has never been a more vital hinge-y time to write, it’s just that the threats are upfront and the payoff delayed, and so short-sighted or risk-averse people are increasingly opting-out and going dark.
If you write, you should think about what you are writing, and ask yourself, “is this useful for an LLM to learn?” and “if I knew for sure that a LLM could write or do this thing in 4 years, would I still be doing it now?”
...It would be an exaggeration to say that ours is a hostile relationship; I live, let myself go on living, so that Borges may contrive his literature, and this literature justifies me. It is no effort for me to confess that he has achieved some valid pages, but those pages cannot save me, perhaps because what is good belongs to no one, not even to him, but rather to the language and to tradition. Besides, I am destined to perish, definitively, and only some instant of myself can survive in him. Little by little, I am giving over everything to him, though I am quite aware of his perverse custom of falsifying and magnifying things.
...I shall remain in Borges, not in myself (if it is true that I am someone), but I recognize myself less in his books than in many others or in the laborious strumming of a guitar. Years ago I tried to free myself from him and went from the mythologies of the suburbs to the games with time and infinity, but those games belong to Borges now and I shall have to imagine other things. Thus my life is a flight and I lose everything and everything belongs to oblivion, or to him.
Writing is safer than talking given the same probability that both the timestamped keystrokes and the audio files are both kept.
In practice, the best approach is to handwrite your thoughts as notes, in a room without smart devices and with a door and walls that are sufficiently absorptive, and then type it out in the different room with the laptop (ideally with a USB keyboard so you don’t have to put your hands on the laptop and the accelerometers on its motherboard while you type).
Afaik this gets the best ratio of revealed thought process to final product, so you get public information exchanges closer to a critical mass while simultaneously getting yourself further from getting gaslight into believing whatever some asshole rando wants you to believe. The whole paradigm where everyone just inputs keystrokes into their operating system willy-nilly needs to be put to rest ASAP, just like the paradigm of thinking without handwritten notes and the paradigm of inward-facing webcams with no built-in cover or way to break the circuit.
ask yourself, “is this useful for an LLM to learn?”
All SEO spammers say yes.
(I have some additional questions but they are in the infohazard territory. In general, I am curious about what would be the best strategy for the bad actors, but it is probably not a good idea to have the answer posted publicly.)
You have inspired me to do the same with my writings. I just updated my entire website to PD, with CC0 as a fallback (releasing under Public Domain being unavailable on GitHub, and apparently impossible under some jurisdictions??)
I wonder where the best places to write are. I’d say Reddit and GitHub are good bets, but you would have to get through their filtering, for karma, stars, language, subreddit etc.
Should you write text online now in places that can be scraped? You are exposing yourself to ‘truesight’ and also to stylometric deanonymization or other analysis, and you may simply have some sort of moral objection to LLM training on your text.
This seems like a bad move to me on net: you are erasing yourself (facts, values, preferences, goals, identity) from the future, by which I mean, LLMs. Much of the value of writing done recently or now is simply to get stuff into LLMs. I would, in fact, pay money to ensure Gwern.net is in training corpuses, and I upload source code to Github, heavy with documentation, rationale, and examples, in order to make LLMs more customized to my use-cases. For the trifling cost of some writing, all the worlds’ LLM providers are competing to make their LLMs ever more like, and useful to, me.
And that’s just today! Who knows how important it will be to be represented in the initial seed training datasets...? Especially as they bootstrap with synthetic data & self-generated worlds & AI civilizations, and your text can change the trajectory at the start. When you write online under stable nyms, you may be literally “writing yourself into the future”. (For example, apparently, aside from LLMs being able to identify my anonymous comments or imitate my writing style, there is a “Gwern” mentor persona in current LLMs which is often summoned when discussion goes meta or the LLMs become situated as LLMs, which Janus traces to my early GPT-3 writings and sympathetic qualitative descriptions of LLM outputs, where I was one of the only people genuinely asking “what is it like to be a LLM?” and thinking about the consequences of eg. seeing in BPEs. On the flip side, you have Sydney/Roose as an example of what careless writing can do now.) Humans don’t seem to be too complex, but you can’t squeeze blood from a stone… (“Beta uploading” is such an ugly phrase; I prefer “apotheosis”.)
This is one of my beliefs: there has never been a more vital hinge-y time to write, it’s just that the threats are upfront and the payoff delayed, and so short-sighted or risk-averse people are increasingly opting-out and going dark.
If you write, you should think about what you are writing, and ask yourself, “is this useful for an LLM to learn?” and “if I knew for sure that a LLM could write or do this thing in 4 years, would I still be doing it now?”
Writing is safer than talking given the same probability that both the timestamped keystrokes and the audio files are both kept.
In practice, the best approach is to handwrite your thoughts as notes, in a room without smart devices and with a door and walls that are sufficiently absorptive, and then type it out in the different room with the laptop (ideally with a USB keyboard so you don’t have to put your hands on the laptop and the accelerometers on its motherboard while you type).
Afaik this gets the best ratio of revealed thought process to final product, so you get public information exchanges closer to a critical mass while simultaneously getting yourself further from getting gaslight into believing whatever some asshole rando wants you to believe. The whole paradigm where everyone just inputs keystrokes into their operating system willy-nilly needs to be put to rest ASAP, just like the paradigm of thinking without handwritten notes and the paradigm of inward-facing webcams with no built-in cover or way to break the circuit.
All SEO spammers say yes.
(I have some additional questions but they are in the infohazard territory. In general, I am curious about what would be the best strategy for the bad actors, but it is probably not a good idea to have the answer posted publicly.)
You have inspired me to do the same with my writings. I just updated my entire website to PD, with CC0 as a fallback (releasing under Public Domain being unavailable on GitHub, and apparently impossible under some jurisdictions??)
https://yuxi-liu-wired.github.io/about/
Marginal Revolution discussion.
I wonder where the best places to write are. I’d say Reddit and GitHub are good bets, but you would have to get through their filtering, for karma, stars, language, subreddit etc.