Actually it seems that the industry standard (?) process of code review was followed just fine. Yet the wrong logic still went through. (Actually based on the github PR it seems that the reviewer himself suggested the wrong logic?)
I think in this case there would also be plenty to say about blindly following checklists. (Could code review in some cases make things worse by making people think less about the code they write?)
EDIT: Actually based on the TS types user.karma can’t be missing. Either the types or Ruby’s explanation is wrong. Clearly multiple things had to go wrong for this bug to slip through.
I don’t know all the details of what testing was done, but I would not describe code review and then deploying as state-of-the-art as this ignores things like staged deploys, end-to-end testing, monitoring, etc. Again, I’m not familiar with the LW codebase and deploy process so it’s possible all these things are in place, in which case I’d be happy to retract my comment!
To me it seems that the average best practices are being followed.[1] But these “best practices” are still just a bunch of band-aids, which happen to work fairly very well for most use-cases.
A much more interesting question to ask here is what if something important like … humanity’s survival depended on your software? It seems that software correctness will be quite important for alignment. Yet I see very few people seriously trying to make creating correct software scalable. (And it seems like a field particularly suited for empirical work, unlike alignment. I mean, just throw your wildest ideas at a proof checker, and see what sticks. After you have a proof, it doesn’t matter at all how it was obtained.)
And I think the amount of effort in this case is perfectly justified. I mean this was code for a one-off single day event, nothing mission critical. It would be unreasonable to expect much more for something like this.
Mainly commenting on your footnote, I generally agree that it’s fine to put low amounts of effort into one-off simple events. The caveat here is that this is an event that is 1) treated pretty seriously in past years and 2) is a symbol of a certain mindset that I think typically includes double-checking things and avoiding careless mistakes.
This is why I really wish we had an AI that had superhuman code, theorem proving and translation to natural language, and crucially only those capabilities, so that we can prove certain properties.
The types are wrong. It’s sad that the types are wrong. I sure wish they weren’t wrong, but changing it in all the cases is a pretty major effort (we would have to manually annotate all fields to specify which things can be null/undefined in the database, and which ones can’t, and then would have to go through a lot of type errors).
On the other hand I am afraid this reinforces NaiveTortoise’s point, this seems like an underlying issue that could potentially lead to bugs much worse than this...
Actually it seems that the industry standard (?) process of code review was followed just fine. Yet the wrong logic still went through. (Actually based on the github PR it seems that the reviewer himself suggested the wrong logic?)
I think in this case there would also be plenty to say about blindly following checklists. (Could code review in some cases make things worse by making people think less about the code they write?)
EDIT: Actually based on the TS types
user.karma
can’t be missing. Either the types or Ruby’s explanation is wrong. Clearly multiple things had to go wrong for this bug to slip through.I don’t know all the details of what testing was done, but I would not describe code review and then deploying as state-of-the-art as this ignores things like staged deploys, end-to-end testing, monitoring, etc. Again, I’m not familiar with the LW codebase and deploy process so it’s possible all these things are in place, in which case I’d be happy to retract my comment!
To me it seems that the average best practices are being followed.[1] But these “best practices” are still just a bunch of band-aids, which happen to work fairly very well for most use-cases.
A much more interesting question to ask here is what if something important like … humanity’s survival depended on your software? It seems that software correctness will be quite important for alignment. Yet I see very few people seriously trying to make creating correct software scalable. (And it seems like a field particularly suited for empirical work, unlike alignment. I mean, just throw your wildest ideas at a proof checker, and see what sticks. After you have a proof, it doesn’t matter at all how it was obtained.)
And I think the amount of effort in this case is perfectly justified. I mean this was code for a one-off single day event, nothing mission critical. It would be unreasonable to expect much more for something like this.
Mainly commenting on your footnote, I generally agree that it’s fine to put low amounts of effort into one-off simple events. The caveat here is that this is an event that is 1) treated pretty seriously in past years and 2) is a symbol of a certain mindset that I think typically includes double-checking things and avoiding careless mistakes.
This is why I really wish we had an AI that had superhuman code, theorem proving and translation to natural language, and crucially only those capabilities, so that we can prove certain properties.
The types are wrong. It’s sad that the types are wrong. I sure wish they weren’t wrong, but changing it in all the cases is a pretty major effort (we would have to manually annotate all fields to specify which things can be null/undefined in the database, and which ones can’t, and then would have to go through a lot of type errors).
Yeah, makes sense. Indeed such tech debt can’t be fixed overnight.
The types were introduced after most of the user objects had already been created, and I guess no one ever ran a migration to fill them in.
I see, makes sense.
On the other hand I am afraid this reinforces NaiveTortoise’s point, this seems like an underlying issue that could potentially lead to bugs much worse than this...