Does the system keep track about individual downvotes (who downvoted what)? If yes, then it could be possible to simply revert all votes ever by Eugine. Which should solve all the problems: everyone would have the same total karma and comment karma as if this whole thing never happened.
It has to—otherwise you wouldn’t be able to see what YOU upvoted/downvoted.
Also, otherwise you would be able to upvote or downvote something multiple times.
So clearly, it has to track somewhere.
If you guys need a SQL guy to help do some development work to make meta-moderation easier, let me know; I’ll happily volunteer a few hours a week.
EDIT: AAAUUUGH REDDIT’S DB USES KEY-VALUE PAIRS AIIEEEE IT ONLY HAS TWO TABLES OH GOD WHY WHY SAVE ME YOG-SOTHOTH I HAVE GAZED INTO THE ABYSS AAAAAAAIIIIGH okay. I’ll still do it. whimper
Maybe that’s why volunteer dev work for LW is so hard to come by. Everybody takes one look at the DB and decides they would prefer a very long vacation in Sarlacc, Tatooine.
Didn’t even get to the point of getting the DB up and running when I looked into it before I ran out of motviation (at that time). LW-hacking is not particularly accessible, though it’s not clear how high making it more accessible is as a priority.
The Reddit guys really, really dislike doing schema updates at their scale. They were getting very slow, and their replication setup was not happy about being told to, say, index a new column while people are doing lots of reads and writes at the same time. So they eventually said “to hell with it; we’ll just make a document database, with no schema, and handle consistency problems by not handling them. Man, do not even ask us about joins.” This seems to have made them much happier than the ‘better’ database design they used to use, which is important when you’re a too-small team dealing with terrifying scaling issues, and you know that a lot of people are watching you because they are the ones causing the scaling issues.
This design sure does make writing SQL queries a pain, though, and it’s less than ideal for a site like Less Wrong, which doesn’t do much changing the code.
Structured tables. One for posts, one for comments, one or more for karma and so on, with appropriately typed columns for each attribute such things have. Alternatively if the data really is unstructured then I’d use a key-value store like Cassandra or something.
(For the record many modern key-value stores didn’t exist when the Reddit code was originally written).
Seconding this. A proper relational database would look something like this:
CREATE TABLE Users
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(250),
passwordHash VARCHAR(250),
firstname VARCHAR(250),
lastname VARCHAR(250),
description VARCHAR(MAX),
dateCreated DATETIME NOT NULL DEFAULT GETDATE(),
dateLoggedIn DATETIME NOT NULL DEFAULT GETDATE(),
active CHAR(1)
);
CREATE TABLE Themes
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(250),
description VARCHAR(MAX),
css VARCHAR(MAX),
dateCreated DATETIME NOT NULL DEFAULT GETDATE(),
dateEdited DATETIME NOT NULL DEFAULT GETDATE(),
active CHAR(1)
);
CREATE TABLE Forums
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(250),
description VARCHAR(MAX),
users_id_owner INT NOT NULL FOREIGN KEY REFERENCES Users(id),
themes_id INT NOT NULL FOREIGN KEY REFERENCES Themes(id),
dateCreated DATETIME NOT NULL DEFAULT GETDATE(),
dateEdited DATETIME NOT NULL DEFAULT GETDATE(),
active CHAR(1)
);
CREATE TABLE Posts
(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
forums_id INT NOT NULL FOREIGN KEY REFERENCES Forums(id),
posts_id_parent INT NOT NULL FOREIGN KEY REFERENCES Posts(id),
users_id_poster INT NOT NULL FOREIGN KEY REFERENCES Users(id),
title VARCHAR(250) NOT NULL,
text VARCHAR(MAX) NOT NULL,
dateCreated DATETIME NOT NULL DEFAULT GETDATE(),
dateEdited DATETIME NOT NULL DEFAULT GETDATE(),
active CHAR(1)
);
CREATE TABLE Votes
(
value INT NOT NULL,
posts_id INT NOT NULL FOREIGN KEY REFERENCES Posts(id),
users_id_voter INT NOT NULL FOREIGN KEY REFERENCES Users(id),
dateCreated DATETIME NOT NULL DEFAULT GETDATE()
);
-- constraint: only one vote per post per user
ALTER TABLE Votes ADD CONSTRAINT pk_Votes PRIMARY KEY (posts_id,user_id)
With that schema, all you’d have to do to see someone’s effect on another person’s karma is:
SELECT SUM(VALUE) FROM Votes
WHERE users_id_voter = @Voter
AND posts_id IN
(SELECT id FROM Posts WHERE users_id_poster = @User)
Does the system keep track about individual downvotes (who downvoted what)? If yes, then it could be possible to simply revert all votes ever by Eugine. Which should solve all the problems: everyone would have the same total karma and comment karma as if this whole thing never happened.
It has to—otherwise you wouldn’t be able to see what YOU upvoted/downvoted.
Also, otherwise you would be able to upvote or downvote something multiple times.
So clearly, it has to track somewhere.
If you guys need a SQL guy to help do some development work to make meta-moderation easier, let me know; I’ll happily volunteer a few hours a week.
EDIT: AAAUUUGH REDDIT’S DB USES KEY-VALUE PAIRS AIIEEEE IT ONLY HAS TWO TABLES OH GOD WHY WHY SAVE ME YOG-SOTHOTH I HAVE GAZED INTO THE ABYSS AAAAAAAIIIIGH okay. I’ll still do it. whimper
GIVE THAT USER UPVOTES FOR BRAVERY. Thank you.
I was scrolling through, saw this comment and reread ialdabaoth’s comment and upvoted, which I wouldn’t have without yours. upvoted.
Well, that explains a couple of things.
Maybe that’s why volunteer dev work for LW is so hard to come by. Everybody takes one look at the DB and decides they would prefer a very long vacation in Sarlacc, Tatooine.
Didn’t even get to the point of getting the DB up and running when I looked into it before I ran out of motviation (at that time). LW-hacking is not particularly accessible, though it’s not clear how high making it more accessible is as a priority.
When did you last try? You should be able to more-or-less go
git checkout
→vagrant up
and have everything pretty much ready to go. https://github.com/tricycle/lesswrong/wiki/Development-VM-ImageThe Reddit guys really, really dislike doing schema updates at their scale. They were getting very slow, and their replication setup was not happy about being told to, say, index a new column while people are doing lots of reads and writes at the same time. So they eventually said “to hell with it; we’ll just make a document database, with no schema, and handle consistency problems by not handling them. Man, do not even ask us about joins.” This seems to have made them much happier than the ‘better’ database design they used to use, which is important when you’re a too-small team dealing with terrifying scaling issues, and you know that a lot of people are watching you because they are the ones causing the scaling issues.
This design sure does make writing SQL queries a pain, though, and it’s less than ideal for a site like Less Wrong, which doesn’t do much changing the code.
Being fairly ignorant of databases… how would you have laid it out better, in a general sense?
Structured tables. One for posts, one for comments, one or more for karma and so on, with appropriately typed columns for each attribute such things have. Alternatively if the data really is unstructured then I’d use a key-value store like Cassandra or something.
(For the record many modern key-value stores didn’t exist when the Reddit code was originally written).
Seconding this. A proper relational database would look something like this:
With that schema, all you’d have to do to see someone’s effect on another person’s karma is:
EDIT: Wow, formatting is a pain.
It’s heartwarming to see off-the-cuff SQL that includes foreign key constraints.
Heartwarming enough to offer me a job? ;)
EDIT: Downvoted? Ouch...