Thanks for your input. Though ideally we wouldn’t have to go through an email server, it may just be required at some level of security.
As for the patterns, the nice thing is that with a small output space in the millions, there are tons of overlapping reasonable addresses even if you pin it down to a domain. Every English first and last name combo even without any numbers in it is already a lot larger than 10 million, meaning even targeted domains should have plenty of collisions.
There’s an idea in security where you should avoid weak security because it lets you trick yourself into thinking you’re doing something. For example, if you’re not going to protect passwords, in some sense it’s better to leave them completely plaintext instead of hashing them with MD5. At least in the plaintext case you know you’re not protecting them (and won’t accidentally do something unsafe with it on the assumption that it’s already protected by being hashed).
I feel like this is a case like that:
If you don’t care if these become public, consider just making it public.
If you don’t think they should be public, use something that guarantees that they’re not (like the random ID solution)
The solution you proposed is better than nothing and might protect some email addresses in some cases, but it begs the questions: If you need to protect these sometimes, why not all the time; and if not protecting them sometimes is ok, why bother at all?
(I should say though that there are benefits to making data annoying to access, like that your scheme will protect the data from casual snoopers, and prevent it from being crawled by search engines unless someone goes to the trouble of de-anonymizing and reposting it. My point is mostly just that you should ask if you’re ok with it becoming entirely public or not)
Thanks for your input. Though ideally we wouldn’t have to go through an email server, it may just be required at some level of security.
As for the patterns, the nice thing is that with a small output space in the millions, there are tons of overlapping reasonable addresses even if you pin it down to a domain. Every English first and last name combo even without any numbers in it is already a lot larger than 10 million, meaning even targeted domains should have plenty of collisions.
There’s an idea in security where you should avoid weak security because it lets you trick yourself into thinking you’re doing something. For example, if you’re not going to protect passwords, in some sense it’s better to leave them completely plaintext instead of hashing them with MD5. At least in the plaintext case you know you’re not protecting them (and won’t accidentally do something unsafe with it on the assumption that it’s already protected by being hashed).
I feel like this is a case like that:
If you don’t care if these become public, consider just making it public.
If you don’t think they should be public, use something that guarantees that they’re not (like the random ID solution)
The solution you proposed is better than nothing and might protect some email addresses in some cases, but it begs the questions: If you need to protect these sometimes, why not all the time; and if not protecting them sometimes is ok, why bother at all?
(I should say though that there are benefits to making data annoying to access, like that your scheme will protect the data from casual snoopers, and prevent it from being crawled by search engines unless someone goes to the trouble of de-anonymizing and reposting it. My point is mostly just that you should ask if you’re ok with it becoming entirely public or not)