Sorry for the downtime, looks like we got DDosd
We were down between around 7PM and 8PM PT today. Sorry about that.
It’s hard to tell whether we got DDosd or someone just wanted to crawl us extremely aggressively, but we’ve had at least a few hundred IP addresses and random user agents request a lot of quite absurd pages, in a way that was clearly designed to avoid bot-detection and block methods.
I wish we were more robust to this kind of thing, and I’ll be monitoring things tonight to prevent it from happening again, but it would be a whole project to make us fully robust to attacks of this kind. I hope it was a one-off occurence, but also, I think we can figure out how to make it so we are robust to repeated DDos attacks, if that is the world we live in, though I do think it would mean strapping in for a few days of spotty reliability while we figure out how to do that.
Sorry again, and boo for the people doing this. It’s one of the reasons why running a site like LessWrong is harder than it should be.
I recommend Cloudflare.
Yeah, we considered setting up a Cloudflare proxy for a while, but at least for logged-in users, LW is actually a really quite dynamic and personalized website, and not a great fit for it (I do think it would be nice to have a logged-out version of pages available on a Cloudflare proxy somehow).
I was referring to their (free) DDoS protection service, rather than their CDN services (also free). In addition to their automated system, you can manually enable an “under-attack” mode that aggressively captchas requests.
Setup is simply pointing DNS name-servers at Cloudflare. Caching HTML pages for logged out (i.e. cookie-less) users is a trivial config (“cache-everything”).
Oh, interesting. I had not properly realized you could unbundle these. I am hesitant to add a hop to each request, but I do sure expect Cloudflare to be fast. I’ll look into it, and thanks for the recommendation.
It’s a solution! However it comes with its own downsides. For instance, Codeforces users ranted on Cloudflare usage for a while, with following things (mapped to LessWrong) highlighted:
The purpose of an API is defeated: even the API endpoints on the same domain are restricted, which prevents users from requesting posts via GraphQL. In particular, ReviewBot will be down (or be hosted in LW internal infrastructure).
In China, Cloudflare is a big speed bump.
Cloudflare-protected sites are reported to randomly lag a lot.
> I had been assuming that this is a server problem, but from talking to some people it seems like this is an issue with differential treatment of who is accessing CF.
Lack of interaction smoothness might be really noticeable for new users, comparing to current state.
If you have the developer time for it, have you considered building a cryptocurrency-based firewall? Pay $1 to whitelist your IPv6 range in the firewall.
What to do with non-whitelisted IPs is up to you, you could limit the bandwidth for them.
I suggest this because the endgame of the IP address doxxing performed by companies like cloudflare is the death of anonymity on the internet. Each ISP has a finite IP range and a finite number of optical fiber cables, so there’s only so many times someone can change their IP address.
(Sure the NSA probably knows who you are anyway, but IP ranges mapped to real names by random companies are eventually going to end up sold on the dark web to basically anyone with money.)
Interesting thought. I tend to agree that the endgame of … protection from scalable attacks in general … is lack of anonymity. Without identity, there can be no memory of behavior, and no prevention of abuse that’s only harmful across multiple events/sources. I suspect it’s a long way out, though.
Your proposed solution (paid IP whitelisting) is pretty painful—the vast majority of real users (and authorized scrapers) don’t have a persistent enough address, or at least don’t know that they do, to participate.
Hi! Created a (named) account for this—in fact, I think you can conceptually get some of those reputational defenses (memory of behavior; defense against multi-event attacks) without going so far as to drop anonymity / prove one’s identity!
See my Twitter thread here, summarizing our paper on Personhood Credentials.
Paper’s abstract:
This seems just like regular auth, just using a trusted 3P to re-anonymize. Maybe I’m missing something, though. It seems likely it won’t provide much value if it’s unbreakably anonymous (because it only takes a few stolen credentials to give an attacker access to fake-humanity), and doesn’t provide sufficient anonymity for important uses if it’s escrowed (such that the issuer CAN track identity and individual usage, even if they currently choose not to).
Yeah I appreciate the engagement, I don’t think either of those is a knock-down objection though:
The ability to illicitly gain a few credentials —> >1 account is still meaningfully different from being able to create ~unbounded accounts. It is true this means a PHC doesn’t 100% ensure a distinct person, but it can still be a pretty high assurance and significantly increase the cost of doing attacks that depend on scale.
Re: the second point, I’m not sure I fully understand—say more? By our paper’s definitions, issuers wouldn’t be able to merely choose to identify individuals. In fact, even if an issuer and service-provider colluded, PHCs are meant to be robust to this. (Devil is in the details of course.)
another weird bug is if i click the link i was just sent in my email, it brings me to a 403 Forbidden page (even though the URLs of this functional page and that 403 page look identical)
Should now be fixed. We’ve blocked traffic to basically all pages and been restoring them incrementally to make sure we don’t go down again immediately. I just lifted the last of those blocks.
works!