I don’t know all the details of what testing was done, but I would not describe code review and then deploying as state-of-the-art as this ignores things like staged deploys, end-to-end testing, monitoring, etc. Again, I’m not familiar with the LW codebase and deploy process so it’s possible all these things are in place, in which case I’d be happy to retract my comment!
To me it seems that the average best practices are being followed.[1] But these “best practices” are still just a bunch of band-aids, which happen to work fairly very well for most use-cases.
A much more interesting question to ask here is what if something important like … humanity’s survival depended on your software? It seems that software correctness will be quite important for alignment. Yet I see very few people seriously trying to make creating correct software scalable. (And it seems like a field particularly suited for empirical work, unlike alignment. I mean, just throw your wildest ideas at a proof checker, and see what sticks. After you have a proof, it doesn’t matter at all how it was obtained.)
And I think the amount of effort in this case is perfectly justified. I mean this was code for a one-off single day event, nothing mission critical. It would be unreasonable to expect much more for something like this.
Mainly commenting on your footnote, I generally agree that it’s fine to put low amounts of effort into one-off simple events. The caveat here is that this is an event that is 1) treated pretty seriously in past years and 2) is a symbol of a certain mindset that I think typically includes double-checking things and avoiding careless mistakes.
This is why I really wish we had an AI that had superhuman code, theorem proving and translation to natural language, and crucially only those capabilities, so that we can prove certain properties.
I don’t know all the details of what testing was done, but I would not describe code review and then deploying as state-of-the-art as this ignores things like staged deploys, end-to-end testing, monitoring, etc. Again, I’m not familiar with the LW codebase and deploy process so it’s possible all these things are in place, in which case I’d be happy to retract my comment!
To me it seems that the average best practices are being followed.[1] But these “best practices” are still just a bunch of band-aids, which happen to work fairly very well for most use-cases.
A much more interesting question to ask here is what if something important like … humanity’s survival depended on your software? It seems that software correctness will be quite important for alignment. Yet I see very few people seriously trying to make creating correct software scalable. (And it seems like a field particularly suited for empirical work, unlike alignment. I mean, just throw your wildest ideas at a proof checker, and see what sticks. After you have a proof, it doesn’t matter at all how it was obtained.)
And I think the amount of effort in this case is perfectly justified. I mean this was code for a one-off single day event, nothing mission critical. It would be unreasonable to expect much more for something like this.
Mainly commenting on your footnote, I generally agree that it’s fine to put low amounts of effort into one-off simple events. The caveat here is that this is an event that is 1) treated pretty seriously in past years and 2) is a symbol of a certain mindset that I think typically includes double-checking things and avoiding careless mistakes.
This is why I really wish we had an AI that had superhuman code, theorem proving and translation to natural language, and crucially only those capabilities, so that we can prove certain properties.