Strongly upvoted for the clear write-up, thank you for that, and engagement with a potentially neglected issue.
Following your post I’d distinguish two issues:
(a) Lack of data privacy enabling a powerful future agent to target/manipulate you personally, because your data is just there for the taking, stored in not-so-well-protected databases, cross-reference is easier at higher capability levels, singling you out and fine-tuning a behavioral model on you in particular isn’t hard ;
(b) Lack of data privacy enabling a powerful future agent to build that generic behavioral model of humans from the thousands/millions of well-documented examples from people who aren’t particularly bothered by privacy, from the same databases as above, plus simply (semi-)public social media records.
From your deception examples we already have strong evidence that (b) is possible. LLM capabilities will get better, and it will get worse when [redacted plausible scenario because my infohazard policies are ringing].
In (b) comes to pass, I would argue that the marginal effort needed to prevent (a) would only be useful to prevent certain whole coordinated groups of people (who should already be infosec-aware) to be manipulated. Rephrased: there’s already a ton of epistemic failures all over the place but maybe there can be pockets of sanity linked to critical assets.
I may be missing something as well. Also seconding the Seed webtoon recommendation.
I did consider the distinction between a model of humans vs. a model of you personally. But I can’t really see any realistic way of stopping the models from having better models of humans in general over time. So yeah, I agree with you that the small pockets of sanity are currently the best we can hope for. It was mainly to spread the pocket of sanity from infosec to the alignment space is why I wrote up this post. Because I would consider the minds of alignment researchers to be critical assets.
As to why predictive models of humans in general seems unstoppable—I thought it might be too much to ask to not even provide anonymized data because there are a lot of good capabilities that are enabled by that (e.g. better medical diagnoses). Even if it is not too heavy of a capability loss most people would still provide data because they simply don’t care or remain unaware. Which is why I used the wording—stem the flow of data and delay timelines instead of stopping the flow.
Strongly upvoted for the clear write-up, thank you for that, and engagement with a potentially neglected issue.
Following your post I’d distinguish two issues:
(a) Lack of data privacy enabling a powerful future agent to target/manipulate you personally, because your data is just there for the taking, stored in not-so-well-protected databases, cross-reference is easier at higher capability levels, singling you out and fine-tuning a behavioral model on you in particular isn’t hard ;
(b) Lack of data privacy enabling a powerful future agent to build that generic behavioral model of humans from the thousands/millions of well-documented examples from people who aren’t particularly bothered by privacy, from the same databases as above, plus simply (semi-)public social media records.
From your deception examples we already have strong evidence that (b) is possible. LLM capabilities will get better, and it will get worse when [redacted plausible scenario because my infohazard policies are ringing].
In (b) comes to pass, I would argue that the marginal effort needed to prevent (a) would only be useful to prevent certain whole coordinated groups of people (who should already be infosec-aware) to be manipulated. Rephrased: there’s already a ton of epistemic failures all over the place but maybe there can be pockets of sanity linked to critical assets.
I may be missing something as well. Also seconding the Seed webtoon recommendation.
I did consider the distinction between a model of humans vs. a model of you personally. But I can’t really see any realistic way of stopping the models from having better models of humans in general over time. So yeah, I agree with you that the small pockets of sanity are currently the best we can hope for. It was mainly to spread the pocket of sanity from infosec to the alignment space is why I wrote up this post. Because I would consider the minds of alignment researchers to be critical assets.
As to why predictive models of humans in general seems unstoppable—I thought it might be too much to ask to not even provide anonymized data because there are a lot of good capabilities that are enabled by that (e.g. better medical diagnoses). Even if it is not too heavy of a capability loss most people would still provide data because they simply don’t care or remain unaware. Which is why I used the wording—stem the flow of data and delay timelines instead of stopping the flow.