I am trying to be as realistic as I can while realizing that privacy is inversely proportional to convenience.
So no, of course you should not stop making lesswrong posts.
The main things I suggested were—removing the ability to use data by favoring E2EE, and additionally removing the ability to hoard data, by favoring decentralized (or local) storage and computation.
As an example just favor E2EE services for collaborating instead of drive, dropbox, or office suite if you have the ability to do so. I agree that this doesn’t solve the problem but at least it gets people accustomed to thinking about using privacy-focused alternatives. So it is one step.
Another example would be using an OS which has no telemetry and gives you root access, both on your computer and on your smartphone.
There is a different class of suggestions that fall under digital hygiene in general, but as mentioned in the - ‘what is this post about’ section, that is not what this post is about. I am also intentionally avoiding naming the alternative services because I didn’t want this post to come across as a shill.
Also, this is all a question of timelines. If people think that AGI/ASI rears its head within the next years or decade, I would agree that there might be bigger fires to put out.
Let me explain my understanding of your model. An AI wants to manipulate you. To do that, it builds a model of you. It starts out with a probability distribution over the mind space that is its understanding of what human minds are like. Then, as it gathers information on you, it updates those probabilities. The more data it is given, the more accurate the model gets. Then it can model how you respond to a bunch of different stimuli and choose the one that gets the most desirable result.
But if this model is like any learning process I know about, the chart of how much is learned over time will probably look vaguely logarithmic, so once it is through half of the data, it will be way more than halfway through the improvement on the model. So if you’re thirty now, and have been using not end to end encrypted messaging your whole life and all that is on some server and ends up in an AI, you’ve probably already thrown away more than 90% of the game, whatever you do today. Especially since if you keep making public posts it can track changes in those to expect what changes are in you for its already good model anyway.
I keep going back and forth about whether your point is a good one or not. (Your point being that it’s useful to prevent non-public data about you from being easier to access by AIs on big server farms like Google’s or whatever, even if you haven’t been doing that so far, and you keep putting out public data.) Your idea sort of seems like a solution to another problem.
I do think your public internet presence will reveal much more to a manipulative AI than a manipulative human. AIs can make connections we can’t see. And a lot of AIs will be trained on just the internet as a whole, so while your private data may end up in a few AIs, or many if they gain the ability to hack, your public data will be in tons and tons of AIs. For say an LLM to work like this, it will have to be able to model your writing to optimize its reward. If you’re sufficiently different than other people that you need a unique way to be manipulated, your writing probably needs unique parameters to be optimized by an LLM. So if you have many posts online already, modern AIs probably already know you. And an AI optimized towards manipulation (either terminally or instrumentally) will just by talking to you or hearing you talk figure out who you are, or at least get a decent estimate of where you are in the mind space and already be saying whatever is most convincing to you. So when you post publicly, you are helping every AI manipulate you, but when you post privately, it only helps a few.
I understand your original comment a lot better now. My understanding of what you said is that open source intelligence that anyone provides through their public persona is revealing more than enough information to be damaging. The little that is sent over encrypted channels is just cherries on the cake. So the only real way to avoid manipulation is to first hope that you have not been a very engaged member of the internet for the last decade, and also primarily communicate over private channels.
I suppose I just underestimated how much people actually post stuff online publicly.
One first instinct response I had was identity isolation. That was something I was going to suggest while writing the original post as well. Practicing identity isolation would mean that even if you post anything publicly the data is just isolated to that identity. Every website, every app, is either compartmentalized or is on a completely different identity. Honestly, though that requires almost perfect OPSEC to not be fingerprintable. Besides just that, it’s also way too inconvenient for people to not just use the same email, and phone number or just log in with google everywhere. So even though I would like to suggest it, no one would actually do it. And as you already mentioned most normal people have just been providing boatloads of free OSINT for the last few decades anyway...
Thinking even more, the training dataset over the entire public internet is basically the centralized database of your data that I am worried about anyway. As you mentioned AIs can find feature representations that we as humans cant. So basically even if you have been doing identity isolation, LLMs (or whatever future model) would still be able to fingerprint you as long as you have been posting enough stuff online. And not posting online is not something that most people are willing to do. Even if they are, they have already given away the game by what they have already posted. So in a majority of cases, identity isolation doesn’t help this particular problem of AI manipulation either...
I have always tried to hold the position that even if it might be possible for other people (or AIs) to do something you don’t like (violate privacy/manipulate you), that doesn’t mean you should give up or that you have to make it easy for them. But off the top of my head, without thinking about this some more I can’t really come up with any good solution for people who have been publicly publishing info on the internet for a while. Thank you for giving me food for thought.
There are extensions like adnauseum which try to poison your data trace. Though it’s dubious whether they help much. You could have some kind of crawler thingy which would pretend to be like 100 normal users so you get lost in the noise. But even that could probably be filtered out if someone really wanted to—it would be hard to accurately simulate a human (I also dimly recall reading an article about it?). Maybe something that records other peoples sessions and plays them back? Or a LLM doing it (hehe)? But even that wouldn’t help in the case of being logged in to various services, and I’m guessing that most people don’t automatically log out of gmail whenever they change tabs?
One source of hope is that data gets stale quickly. People can change their minds (even if they don’t), so just because you know what I thought a year ago doesn’t mean that you know what I think now. Then again, most people don’t care either way and it would be pretty simple to remove the small number of outliers who suddenly go dark. One possible way of cleaning up would be to spend a couple of months posting more and more radically strange posts (e.g. going all in on flat earthism) before going private in order to mislead any analysis. This is hard and requires passing an ITT.
Tor + cleaning cookies + logging out of everything after using them + separate user profiles goes a long way. But it’s very inconvenient.
TOR is way too slow and google hates serving content to TOR users. I2P might be faster than TOR but the current adoption is way too low. Additionally, it doesn’t help that identity persistence is a regulatory requirement in most jurisdictions because it helps traceability against identity theft, financial theft, fraud, etc… Cookie cleaning means they have to log in every time which for most people is too annoying.
I acknowledge that there are ways to technically poison existing data. The core problem though is finding things that both normal people and also technically adept (alignment researchers/engineers/...) would actually be willing to do.
The general vibe I see right now is - * shrug shoulders * they already know so I might as well just make my life convenient and continue giving them everything...
Honestly, I don’t really even think it should be the responsibility of the average consumer to have to think about this at all. Should it be your responsibility to check every part of the engine in your car when you want to drive to make sure it is not going to blow up and kill you? Of course not, that responsibility should be on the manufacturer. Similarly, the responsibility for mitigating the adverse effects of data gathering should be on the developing companies not the consumers.
Tor + cleaning cookies + logging out of everything after using them + separate user profiles goes a long way. But it’s very inconvenient.
Uh, I’ve done this since forever and it doesn’t feel so inconvenient to me. I generally use Firefox in private browsing, configured to always throw away all cookies at the end of every session. Ten years ago it wasn’t even a privacy concern, I simply hate to exit from a webpage without proper logout, it feels like not closing the door when leaving your house...
It requires you to actively manage long lived sessions which would otherwise be handled by the site you’re using. You can often get back to where you were by just logging in again, but there are many places (especially for travel or official places) where that pretty much resets the whole flow.
There are also a lot more popups, captchas and other hoops to jump through when you don’t have a cookies trail.
The average user is lazy and doesn’t think about these things, so the web as a whole is moving in the direction of making things easier (but not simpler). This is usually viewed as a good thing by those who then only need to click a single button. Though it’s at the cost of those who want to have more control.
It might not be inconvenient to you, especially as it’s your basic flow. It’s inconvenient for me, but worth the cost, but basically unusable for most of the people I know (compared to the default flow).
Does this mean we should stop making posts and comments on LessWrong?
I am trying to be as realistic as I can while realizing that privacy is inversely proportional to convenience.
So no, of course you should not stop making lesswrong posts.
The main things I suggested were—removing the ability to use data by favoring E2EE, and additionally removing the ability to hoard data, by favoring decentralized (or local) storage and computation.
As an example just favor E2EE services for collaborating instead of drive, dropbox, or office suite if you have the ability to do so. I agree that this doesn’t solve the problem but at least it gets people accustomed to thinking about using privacy-focused alternatives. So it is one step.
Another example would be using an OS which has no telemetry and gives you root access, both on your computer and on your smartphone.
There is a different class of suggestions that fall under digital hygiene in general, but as mentioned in the - ‘what is this post about’ section, that is not what this post is about. I am also intentionally avoiding naming the alternative services because I didn’t want this post to come across as a shill.
Also, this is all a question of timelines. If people think that AGI/ASI rears its head within the next years or decade, I would agree that there might be bigger fires to put out.
Let me explain my understanding of your model. An AI wants to manipulate you. To do that, it builds a model of you. It starts out with a probability distribution over the mind space that is its understanding of what human minds are like. Then, as it gathers information on you, it updates those probabilities. The more data it is given, the more accurate the model gets. Then it can model how you respond to a bunch of different stimuli and choose the one that gets the most desirable result.
But if this model is like any learning process I know about, the chart of how much is learned over time will probably look vaguely logarithmic, so once it is through half of the data, it will be way more than halfway through the improvement on the model. So if you’re thirty now, and have been using not end to end encrypted messaging your whole life and all that is on some server and ends up in an AI, you’ve probably already thrown away more than 90% of the game, whatever you do today. Especially since if you keep making public posts it can track changes in those to expect what changes are in you for its already good model anyway.
I keep going back and forth about whether your point is a good one or not. (Your point being that it’s useful to prevent non-public data about you from being easier to access by AIs on big server farms like Google’s or whatever, even if you haven’t been doing that so far, and you keep putting out public data.) Your idea sort of seems like a solution to another problem.
I do think your public internet presence will reveal much more to a manipulative AI than a manipulative human. AIs can make connections we can’t see. And a lot of AIs will be trained on just the internet as a whole, so while your private data may end up in a few AIs, or many if they gain the ability to hack, your public data will be in tons and tons of AIs. For say an LLM to work like this, it will have to be able to model your writing to optimize its reward. If you’re sufficiently different than other people that you need a unique way to be manipulated, your writing probably needs unique parameters to be optimized by an LLM. So if you have many posts online already, modern AIs probably already know you. And an AI optimized towards manipulation (either terminally or instrumentally) will just by talking to you or hearing you talk figure out who you are, or at least get a decent estimate of where you are in the mind space and already be saying whatever is most convincing to you. So when you post publicly, you are helping every AI manipulate you, but when you post privately, it only helps a few.
I understand your original comment a lot better now. My understanding of what you said is that open source intelligence that anyone provides through their public persona is revealing more than enough information to be damaging. The little that is sent over encrypted channels is just cherries on the cake. So the only real way to avoid manipulation is to first hope that you have not been a very engaged member of the internet for the last decade, and also primarily communicate over private channels.
I suppose I just underestimated how much people actually post stuff online publicly.
One first instinct response I had was identity isolation. That was something I was going to suggest while writing the original post as well. Practicing identity isolation would mean that even if you post anything publicly the data is just isolated to that identity. Every website, every app, is either compartmentalized or is on a completely different identity. Honestly, though that requires almost perfect OPSEC to not be fingerprintable. Besides just that, it’s also way too inconvenient for people to not just use the same email, and phone number or just log in with google everywhere. So even though I would like to suggest it, no one would actually do it. And as you already mentioned most normal people have just been providing boatloads of free OSINT for the last few decades anyway...
Thinking even more, the training dataset over the entire public internet is basically the centralized database of your data that I am worried about anyway. As you mentioned AIs can find feature representations that we as humans cant. So basically even if you have been doing identity isolation, LLMs (or whatever future model) would still be able to fingerprint you as long as you have been posting enough stuff online. And not posting online is not something that most people are willing to do. Even if they are, they have already given away the game by what they have already posted. So in a majority of cases, identity isolation doesn’t help this particular problem of AI manipulation either...
I have always tried to hold the position that even if it might be possible for other people (or AIs) to do something you don’t like (violate privacy/manipulate you), that doesn’t mean you should give up or that you have to make it easy for them. But off the top of my head, without thinking about this some more I can’t really come up with any good solution for people who have been publicly publishing info on the internet for a while. Thank you for giving me food for thought.
There are extensions like adnauseum which try to poison your data trace. Though it’s dubious whether they help much. You could have some kind of crawler thingy which would pretend to be like 100 normal users so you get lost in the noise. But even that could probably be filtered out if someone really wanted to—it would be hard to accurately simulate a human (I also dimly recall reading an article about it?). Maybe something that records other peoples sessions and plays them back? Or a LLM doing it (hehe)? But even that wouldn’t help in the case of being logged in to various services, and I’m guessing that most people don’t automatically log out of gmail whenever they change tabs?
One source of hope is that data gets stale quickly. People can change their minds (even if they don’t), so just because you know what I thought a year ago doesn’t mean that you know what I think now. Then again, most people don’t care either way and it would be pretty simple to remove the small number of outliers who suddenly go dark. One possible way of cleaning up would be to spend a couple of months posting more and more radically strange posts (e.g. going all in on flat earthism) before going private in order to mislead any analysis. This is hard and requires passing an ITT.
Tor + cleaning cookies + logging out of everything after using them + separate user profiles goes a long way. But it’s very inconvenient.
TOR is way too slow and google hates serving content to TOR users. I2P might be faster than TOR but the current adoption is way too low. Additionally, it doesn’t help that identity persistence is a regulatory requirement in most jurisdictions because it helps traceability against identity theft, financial theft, fraud, etc… Cookie cleaning means they have to log in every time which for most people is too annoying.
I acknowledge that there are ways to technically poison existing data. The core problem though is finding things that both normal people and also technically adept (alignment researchers/engineers/...) would actually be willing to do.
The general vibe I see right now is - * shrug shoulders * they already know so I might as well just make my life convenient and continue giving them everything...
Honestly, I don’t really even think it should be the responsibility of the average consumer to have to think about this at all. Should it be your responsibility to check every part of the engine in your car when you want to drive to make sure it is not going to blow up and kill you? Of course not, that responsibility should be on the manufacturer. Similarly, the responsibility for mitigating the adverse effects of data gathering should be on the developing companies not the consumers.
Uh, I’ve done this since forever and it doesn’t feel so inconvenient to me. I generally use Firefox in private browsing, configured to always throw away all cookies at the end of every session. Ten years ago it wasn’t even a privacy concern, I simply hate to exit from a webpage without proper logout, it feels like not closing the door when leaving your house...
It requires you to actively manage long lived sessions which would otherwise be handled by the site you’re using. You can often get back to where you were by just logging in again, but there are many places (especially for travel or official places) where that pretty much resets the whole flow.
There are also a lot more popups, captchas and other hoops to jump through when you don’t have a cookies trail.
The average user is lazy and doesn’t think about these things, so the web as a whole is moving in the direction of making things easier (but not simpler). This is usually viewed as a good thing by those who then only need to click a single button. Though it’s at the cost of those who want to have more control.
It might not be inconvenient to you, especially as it’s your basic flow. It’s inconvenient for me, but worth the cost, but basically unusable for most of the people I know (compared to the default flow).